Heat_Fan89
Reputable
That has been my plan all along. I'm even considering going the prebuilt route again either with Lenovo Legion or HP Omen but i'll need to weigh the cost of both a DIY and prebuilt.
Maybe, but I pretty clearly recall analysis of RTX 4070 Ti vs RTX 4070 Ti Super vs RTX 4080 showing that the base RTX 4070 Ti wasn't really bottlenecked by its 192-bit memory interface.I think the biggest chunk of performance uplift will come from GDDR7.
I remember thinking Radeon VII's performance should be nuts with 1 TB/s of HBM2, yet it ended up not being much faster than Vega 64. So, it can be hard to predict these things, without doing a detailed performance analysis that shows just how much a GPU is being held back by memory bandwidth.GDDR7 + 512 bit bus width should be impressive.
Are you getting paid by the word? I count 1515 of them!
I think the biggest chunk of performance uplift will come from GDDR7.
I keep getting more and more disappointed by the lack of revolutionary graphics thanks to RT. Even fully RT (or path traced if you want to pretend that's what the games are doing) titles don't look massively better than rasterization. Like Black Myth Wukong. Does full RT look better than rasterization? Sure. But it also runs like 1/5 as fast or whatever.Agreed. I also expect the biggest uplift to be in the Ray Tracing sector. Which company will be preferred, though? Samsung?
Right now, how many companies are able to manufacture GDDR7 RAMs on a grand scale?
I keep getting more and more disappointed by the lack of revolutionary graphics thanks to RT. Even fully RT (or path traced if you want to pretend that's what the games are doing) titles don't look massively better than rasterization. Like Black Myth Wukong. Does full RT look better than rasterization? Sure. But it also runs like 1/5 as fast or whatever.
For the professional sector, the improvements in RT are definitely important. For games, we still need more stuff that's like Control (where the RT effects were clear and obvious and better) and less like Diablo IV, Avatar, Star Wars Outlaws, etc. where the RT just tanks performance for limited graphical improvements.
Have you not read that it's made on virtually the same process node as the 4090 and only 22% bigger?
Where did you even get the idea that the 5090 would be anything like such an upgrade?
P.S. I find your username a little ironic for a Nvidia fan, given that Red was an ATI thing. Prior to that, AMD's colors were black, white, and green.
More CUDA cores and faster memory will definitely help but I agree that one should believe it when one sees it.Maybe, but I pretty clearly recall analysis of RTX 4070 Ti vs RTX 4070 Ti Super vs RTX 4080 showing that the base RTX 4070 Ti wasn't really bottlenecked by its 192-bit memory interface.
I remember thinking Radeon VII's performance should be nuts with 1 TB/s of HBM2, yet it ended up not being much faster than Vega 64. So, it can be hard to predict these things, without doing a detailed performance analysis that shows just how much a GPU is being held back by memory bandwidth.
Unless I'm mistaken, Avatar: Frontiers of Pandora uses ray tracing, but it's the usual AMD-promoted use of RT effects, so they offer only very minor overall improvements (and less of a performance hit). Mostly it's just shadows I think.P.S. It's been long since the last time i played it, but, if memory serves, and in addition to being super demanding at Unobtainium settings, Avatar did not even have Ray Tracing settings - which is quite understandable, considering the fact it's an AMD promoted game.
Unless I'm mistaken, Avatar: Frontiers of Pandora uses ray tracing, but it's the usual AMD-promoted use of RT effects, so they offer only very minor overall improvements (and less of a performance hit). Mostly it's just shadows I think.
Avatar is the perfect example of a game with RT where the RT effects are basically meaningless AFAICT.
Thanks for confirming. Sorry if my post sounded a little harsh, but I was genuinely wondering if there was information out there to the contrary.I have not. I literally know nothing about the tech specs... all I know is that the 5000 series is "Coming Soon™."
If I was sitting here with a poor performance GPU I'd probably have paid more attention... but as I have said in previous posts the only reason I'd upgrade to a 5090 is to get a decent resale on my 4090 while I still can.
I didn't... it was an assumption based on the previous gen. If the performance boost is ridiculously low I can see myself waiting on the 6000 series.
Thanks for confirming. Sorry if my post sounded a little harsh, but I was genuinely wondering if there was information out there to the contrary.
I'm there with you on the age, and if you just look at a game (ie, in a blind taste test sort of way), most people wouldn't know if it used RT / PT or rasterization. If you put them side by side, you can see a few differences. If you do screenshots of select areas, RT and PT can certainly look better. But when you factor in the performance hit it all becomes very hard to justify.I've got Avatar and I agree.
Maybe it's just me but I wanna say that RT and PT are kinda gimmicky IMO. Sometimes it's just hard to see the visual improvements. I am 50 though and my eyes aren't what they used to be... but I've taken various titles and turned RT on and off and sometimes had difficulty seeing any change... and definitely not what I would consider game making/breaking.
I wish! But no, I just learned how to touch type from Mrs. Pinkerton of West Muskingum High School in 1980.Are you getting paid by the word? I count 1515 of them!
: D
Seriously, I'd need to run this through ChatGPT and have it summarize for me. At 686 words, the article itself is less than half this long!
So, I believe N4P is supposed to do something like 15~20 percent better than N4 on its own. Meaning, same power, you get 15% more performance and density. This is totally just ballparking things, so I may have some numbers wrong but let's take things in parts.@JarredWaltonGPU I know it may be a bit too early, but, given all the rumours we've heard so far, what would be your personal projection on the performance uplift form 4090 to 5090?
So, I believe N4P is supposed to do something like 15~20 percent better than N4 on its own. Meaning, same power, you get 15% more performance and density. This is totally just ballparking things, so I may have some numbers wrong but let's take things in parts.
AD102 is 609mm^2 with 76.3 billion transistors while GB202 is rumored to be 744mm^2. That's a 22% larger die, and if we also assume 15% more transistors that ends up being around 40% more total transistors.
Power is widely rumored to be 600W for the 5090. I think 4090 / AD102 was still somewhat power constrained, so giving it 33% more power to work with will definitely help. And N4P is more efficient as well, so potentially a 40~50% boost in total performance with higher power use.
That would also dovetail into the memory side of things. Even if 5090 was GDDR6X, going from 384-bit to 512-bit means 33% more bandwidth. Conservatively, I expect at least 28 Gbps GDDR7 (which will be readily available in 32 Gbps form). So, 33% higher clocks and a 33% wider interface combine to yield 78% more total bandwidth.
Undoubtedly, the large L2 cache will continue. Will it be improved? Maybe, which could mean even higher effective bandwidth.
Nvidia tends to keep things pretty balanced on the memory bandwidth improvements, so if it actually boosts memory bandwidth by 78% (or more), I suspect it also thinks there are some architectural improvements that make it so the GPU cores need the additional bandwidth.
But there's still the question of AI and RT hardware. Nvidia has been banging that drum since 2018, and while AI has paid off, I'm not convinced RT has. Yet Nvidia keeps putting faster and 'better' RT hardware into every RTX generation. So maybe the RT side of things needs the big boost in bandwidth more than the rasterization needs it? I don't know.
I still expect to see at least a 30~40 percent increase in performance, relative to the 4090, for the right workloads. Meaning, 4K and maxed out settings, possibly with RT, will see sizeable gains. And I think 1080p will be completely CPU limited and 1440p will be largely CPU limited. I also suspect Nvidia will double down on framegen and the OFA will get some needed improvements to make it so that most of the 50-series GPUs will realize an 80~90 percent increase in frames to monitor with framegen relative to non-framegen.
So that's my guesses. We'll see if I'm even remotely correct in maybe ~5 weeks. LOL.
You have tons of great knowledge and insight to share, which is why I sometimes wish you'd focus your posts a little more so that more people (myself, at least) would actually read them and gain the benefits of doing so. However, I don't come here to read novelettes.I wish! But no, I just learned how to touch type from Mrs. Pinkerton of West Muskingum High School in 1980.
And just like Mr. Safford I was already a piano player before that, which helps with dexterity.
Perhaps the loquatiousness comes from being quatrilingual, and that's not counting Latin as a first foreign language.
Here's what TSMC said about N4P vs. N5:So, I believe N4P is supposed to do something like 15~20 percent better than N4 on its own. Meaning, same power, you get 15% more performance and density. This is totally just ballparking things, so I may have some numbers wrong but let's take things in parts.
AD102 is 609mm^2 with 76.3 billion transistors while GB202 is rumored to be 744mm^2. That's a 22% larger die, and if we also assume 15% more transistors that ends up being around 40% more total transistors.
Nope, you're double-counting, now! TSMC's performance and efficiency figures tell you either how much more performance at the same power, or how much less power at the same performance. Both of these figures assume the exact same design is translated over to the new process node, which I think is a reasonably safe bet for Blackwell (assuming we're talking about performance per unit of area).N4P is more efficient as well, so potentially a 40~50% boost in total performance with higher power use.
My numbers assume they will scale it in proportion to everything else. If you assume it'll get even bigger, then you need to deduct some from the theoretical compute estimate, to compensate.Undoubtedly, the large L2 cache will continue. Will it be improved? Maybe, which could mean even higher effective bandwidth.
Don't forget that Nvidia uses these same GPU dies for inferencing, in server-based products. For instance, the AD102 shows up in their L40 accelerator cards. So, it's possible some of the specs might be aimed more at AI use cases than client rendering workloads. I think it'll be very telling to see what they do with memory clocks, particularly if they indeed go with a 512-bit interface.Nvidia tends to keep things pretty balanced on the memory bandwidth improvements, so if it actually boosts memory bandwidth by 78% (or more), I suspect it also thinks there are some architectural improvements that make it so the GPU cores need the additional bandwidth.
Between the RTX 4070 Ti, the RTX 4080, and the Supers, we don't have enough data to say? I think the answer is probably there and someone just needs to dig it out.maybe the RT side of things needs the big boost in bandwidth more than the rasterization needs it? I don't know.
My primary school in Berlin experimented with a syllable based technique to teach reading and writing, instead of spelling things letter by letter.P.S. I'm both a former piano player and learned to touch type in school (I forget if it was 7th or 8th grade). I remember when one of my high-school friends noticed I was even touch-typing all the shift-symbols, which I had learned from writing code. Sadly, for most of my writing, it's my brain that tends to be the bottleneck.
Yeah, I said 4N and N4P, which obviously isn't quite right. I'm not sure we have precise statements from TSMC or Nvidia about how much better 4N really is compared to N5 or any other nodes, and the same goes for 4NP. Again, I'm just ballparking and putting out some thoughts here, not trying to be 100% accurate because we absolutely do not know what artchitectural changes might be happening, and thus could easily be off by 10~20 percent.Here's what TSMC said about N4P vs. N5:
"N4P will deliver an 11% performance boost over the original N5 technology and a 6% boost over N4. Compared to N5, N4P will also deliver a 22% improvement in power efficiency as well as a 6% improvement in transistor density."We know the "4N" node, used by Ada, was already improved over regular N5, though I'm not sure they ever said by how much. To that end, I think the node actually used by Blackwell is "4NP", which presumably has some improvements over baseline N4P. Anyway, I'd ballpark this using their N5 -> N4P numbers, with the caveat that it might actually overestimate the improvements.
Not really. As you say, it's higher perf at same power, or lower power at same perf. I'm saying we'll get even higher performance while using even more power. If 5090 was 450W, it would get a modest performance bump. Adding transistors increases power, but not linearly, and the voltage-frequency curve also matters.Nope, you're double-counting, now! TSMC's performance and efficiency figures tell you either how much more performance at the same power, or how much less power at the same performance. Both of these figures assume the exact same design is translated over to the new process node, which I think is a reasonably safe bet for Blackwell (assuming we're talking about performance per unit of area).
Same. However, when you're talking about something as concrete as density, it definitely not going to be greater than the 6% improvement they quoted between N5 and N4P. Likewise, the performance deltas they quoted between the two are hard upper limits.Again, I'm just ballparking and putting out some thoughts here, not trying to be 100% accurate because we absolutely do not know what artchitectural changes might be happening, and thus could easily be off by 10~20 percent.
Their statements were assuming you take the exact same design and merely port it from one node to the next. So, I was taking the figures as essentially transistor-normalized estimates (i.e. assuming the same transistor count). If you're running at the iso-power point in the curve, but you also increase the number of transistors by 29%, then you're automatically using 29% more power.Not really. As you say, it's higher perf at same power, or lower power at same perf. I'm saying we'll get even higher performance while using even more power.
If you run them at the same frequency, it basically would. The only way you get sub-linear power scaling is if you're building in some assumptions about lower utilization. However, I'm not really concerned about low-utilization games.Adding transistors increases power, but not linearly,
I believe that's baked into TSMC's power/performance numbers. So, if we start by taking their iso-power data point and treating it as a per-transistor figure, then we get to use their V/F assumptions.the voltage-frequency curve also matters.
I accounted for that by multiplying that 1.3 figure by either 1.06 or 1.11, which depends on whether 4N was closer to N5 or N4.if we have about 30% more transistors, running at up to 33% higher total TGP? It's not going to be just ~30% more performance. It will compound, to some extent.
It's a fantasy, unless you think they're really going to go nuts with power. These numbers must be based on something and you can't tell me where you're getting 30% more perf/W. The part I quoted gave us clear guidance on how much more efficient N4P is. Even the low end of the range of 1.06 to 1.11 might even be an overestimate, when using 4N as a baseline.I think 1.3 * 1.3 is definitely the high water mark,
I think it's mainly about capacity and maybe some AI use cases. Especially when you consider how the RTX 6000 ADA and L40 are using a lower memory clock, you have to assume the Blackwell equivalents will also be using lower-clocked memory. So, for running AI workloads at a lower memory clock, maybe the wider datapath is justified.Nvidia isn't dumb and it wouldn't stick a 512-bit interface with GDDR7 onto GB202 if it wasn't beneficial. Granted, it does need wider interface just to increase capacity, at least for the professional/data center markets.
The thing I feel most confident about saying is that the perf/$ won't be worse. So, we can look at the launch price rumors to get an idea about the lower end of the performance increase.And this is why I was very loose and "spitballing" things above. There's just too much we don't know yet.
In my prediction of 37% to 44%, I feel more comfortable with the lower end of that range. As I said, it depends a lot on what they do with power. Will they really push TGP 30% higher? How much more of that power budget is memory going to take? If the memory needs more than a 30% increase, that will leave less for the GPU, itself. That will cut into the range I stated.Gen on gen, though, 30~50 percent seems the safe bet.
Remind me of the RTX 2080 Ti's improvement vs. GTX 1080 Ti, again?That's what most Nvidia GPU architectures do...
This is basically my position. There are lots of unknowns about the RTX 5090, but you can be reasonably sure of strong demand at launch and that it will be even more expensive. In terms of perf/$, I think it should be an improvement, but maybe not by a lot.The argument stated here makes perfect sense for precisely everyone EXCEPT the 4090 buyer! It's EXCACTLY THOSE PEOPLE WHO WANT A 4090 RIGHT NOW for whom the argument in this article doesn't really hold, because the only GPU that will beat it will likely be the 5090, which will probably cost more and potentially be less available to purchase when it first arrives. Even CES is months away, and we don't know for certain that the 5090 will be available immediately upon announcement.
It DEPENDS! Lol I completely hate the way you guys are framing this discussion, this is hardly any better than a youtube comment section of AMD fanboys, sorry to say.In my prediction of 37% to 44%, I feel more comfortable with the lower end of that range. As I said, it depends a lot on what they do with power. Will they really push TGP 30% higher? How much more of that power budget is memory going to take? If the memory needs more than a 30% increase, that will leave less for the GPU, itself. That will cut into the range I stated.
Remind me of the RTX 2080 Ti's improvement vs. GTX 1080 Ti, again?