Edzieba, first, I apologize for the first sentence in my retort to your post, you are right that one products specs alone are not a good indicator of performance, however the comparative analysis I describe below can predict performance as long as all relevant variables are accounted for.And here we have a shining example of "smaller number bad" in practice. Without also direct testing of texture compression differences between GA103 and AD106 (reducing bandwidth demand), cache behaviour differences between GA103 and AD206 (relocating read/write bottlenecks to a different interface), and whether GA103 was even memory bandwidth bottlenecked in the first place, concluding that a smaller number = worse real-world performance is at best premature.
But of course, internet commentators know better than Nvidia's engineers, and making number bigger means making GPU more betterer (and die area for the extra PHY lanes is just free and has no impact on die cost or power budget).
You know there was always this 60s = 1080p, 70s = 1440p, 80s = 4K.. now it looks like they trying to play it like 70s, 80s, 90s... making the 4060 the new 50s rlly. The 4050 probably won't even launch at this rate, like what's the point if it doesn't look like there's any specs left to nerf
Well 3060 is 1080p GPU. Just fine... And new low end may be $500+
Specs may aid in comparisons within the same architecture, but once you start comparing across architectures they are of little value.Always a good laugh when someone starts their "be reasonable guys" post with a dash of condescension and expects to be taken seriously after.
Specs are a good indicator of performance no matter what, especially when we are comparing a successor product to its predecessor; so pretending people are being idiots for predicting a weak immediate replacement to the 3060Ti based on the rumoured specs is intellectually disingenuous.
Now, I could be wrong and maybe Nvidia engineers have made a leap in let's say delta colour correction and blows the 3060Ti out of the water. Yet at this stage, being concerned and upset at these numbers are far away from being laughable. Let's wait and see, if the 4060Ti with these specs proves me wrong I will be here and own up to it.
If your numbers predict a gen-on-gen real world performance decrease, then one should immediately be questioning those numbers.Edzieba, first, I apologize for the first sentence in my retort to your post, you are right that one products specs alone are not a good indicator of performance, however the comparative analysis I describe below can predict performance as long as all relevant variables are accounted for.
Let me break out my math skills to see if I can indeed predict performance based on specs, the 3080 has 760 GB/s bandwidth and 8704 shader cores running at 1710 MHz boost clock. The 4080 has 720 GB/s bandwidth and 9728 shader cores running at 2505 MHz boost clock. That is -5% bandwidth, +12% shader count, and +46.5% boost clock for the 4080 which gives it a 42% improvement in game frame rate (based on the tech power up average performance relative to other gpu’s IE if 3080 is 100% performance, 4080 is 142% performance) . Using the 4080’s clock speed, bandwidth and shader count values gives us an IPC detriment of -9% for the 4000 series architecture (this makes sense as, just like with the pascal generation, nvidia sacrificed some IPC in order to clock their shader cores higher).
Now that we know the IPC of the 4000 series architecture, we can then calculate the expected performance of the 4060 Ti vs the 3060 Ti. Starting with the 3060 Ti having 100% performance, we subtract 42% bandwidth, subtract 11.8% shader cores, subtract 9% IPC to account for lovelaces lower IPC compared to ampere, and then adding 50.5% higher clock speed(using 4080 boost clock of 2505 MHz since we don’t have data on 4060 Ti clock speed and 1665 MHz for the 3060 Ti’s boost clock), we have a performance detriment of -13% for the 4060 Ti compared to the last gen 3060 Ti. However, if nvidia does increase 4060 Ti clock speed higher than the currently released 4000 series cards, it would take a boost clock (at the absolute limit of ada Lovelaces capabilities) of 3200 MHz (IE a 92% increase compared to the 3060 Ti boost clock) in order to reach my + 10% performance improvement estimate over the 3060 Ti.
I have also checked to see if perhaps smaller nvidia dies are more performance efficient than the larger dies by directly computing 3060 Ti vs 3080 specs and reveals die size has 0% effect on performance so I doubt the 4000 series would be any different.
So like everything in life, the proper use of math in scientific applications does allow us to predict performance.
PS I stupidly put a lot of time in computing all this so please only respond if you have a constructive criticism of my work or a counterpoint, etc.
3090Ti | 4080 | delta | |
core count | 10752 | 9728 | 90% |
boost clock | 1.86 GHz | 2.52 GHz | 135% |
memory capacity | 24 GB | 16 GB | 67% |
memory buswidth | 384 bit | 256 bit | 67% |
memory bandwidth | 1008 GB/s | 717 GB/s | 71% |
die area | 628mm^2 | 379mm^2 | 60% |
Tech power up shows the 4080 performs 16% better than the 3090 Ti. I will admit, this case does jumble my numbers a little bit, but I believe this is more to do with the die size of the 3090 Ti so perhaps die sizes on the extreme side of reticle size does have a performance impact.Specs may aid in comparisons within the same architecture, but once you start comparing across architectures they are of little value.
If your numbers predict a gen-on-gen real world performance decrease, then one should immediately be questioning those numbers.
Lets compare the 3090Ti and 4080 to see if that IPC assumption holds up in practice:
We see the 4080 performing roughly 35% above the 3090Ti (anywhere from 25% to 50% in some outliers) discounting the effects of DLSS3, so even if we ignore the memory bandwidth disparity, that delta could be accounted for in clock speed entirely with no IPC deficit indicated.
3090Ti 4080 deltacore count 10752 9728 90%boost clock 1.86 GHz 2.52 GHz 135%memory capacity 24 GB 16 GB 67%memory buswidth 384 bit 256 bit 67%memory bandwidth 1008 GB/s 717 GB/s 71%die area 628mm^2 379mm^2 60%
At the very least, this should indicate that a dramatic drop in memory bus width and bandwidth is not an issue at all for the Ada Lovelace architecture when it comes to actual performance.
ISO-frequency benchmarking shows IPC continues to increase gen-on-gen: https://www.tweaktown.com/reviews/1...ark-and-AIDA64:~:text=Cinebench R20 IPC & R23PS I think I’m still right about IPC loss for this generation, my evidence is that every attempt in history to push graphics cards to significantly higher clock speeds required a reduction of IPC to stabilize the shaders, and every attempt to significantly increase clock speed on CPU’s required a lengthening of the architectures pipeline, which decreases IPC.
wrong, raptor lakes cores do have an IPC loss but is hidden by the double sized L2 cache keeping the pipeline fed more efficiently.ISO-frequency benchmarking shows IPC continues to increase gen-on-gen: https://www.tweaktown.com/reviews/10222/intel-core-i9-13900k-raptor-lake-cpu/index.html#Cinebench-Crossmark-and-AIDA64:~:text=Cinebench R20 IPC & R23
This is as expected: almost all CPU cores will boost to the ~5GHz mark and have done for a good decade, so performance increases come more from architectural improvements than from frequency scaling. Pipeline scaling may increase latency, but does not necessarily impact IPC, as almost any modern CPU will have multiple dispatches within a pipeline active at any one time (i.e. any given stage will be able to pick up the next parcel of work after completing the previous, rather than waiting for a job to complete the entire pipeline before the next is dispatched). Similar to how FPS and frame times are not simple inverses of each other, due to multiple frames being in flight at once.
Not borne out by actual benchmarking., And since the L2 Cache is part of the core, deciding to arbitrarily exclude it makes about as much sense as saying "Raptor Lake has half the floating point performance if you remove half the FPUs!".wrong, raptor lakes cores do have an IPC loss but is hidden by the double sized L2 cache keeping the pipeline fed more efficiently.
Totally agree but we were talking about the pipeline so it made sense to bring up the cache.Not borne out by actual benchmarking., And since the L2 Cache is part of the core, deciding to arbitrarily exclude it makes about as much sense as saying "Raptor Lake has half the floating point performance if you remove half the FPUs!".