Review Nvidia GeForce RTX 4060 Ti 16GB Review: Does More VRAM Help?

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
What a terrible generation of cards.
It is and it's basically due to lack of competition so Nvidia can do and price their cards as they please because currently, AMD doesn't have a card that is priced to match the RTX 4060 or 4060 Ti. Will the budget AMD cards like the 7700 outperform the 4060 in price and performance? We won't know until those cards show up.

But selling a 192 bit card in 2023 is a slap in the face and charging a premium price is worthy of giving the middle finger to Nvidia. This might all change if and when AMD offers a competing product to match the 4070/4080/4090 series on price and performance. So far the 7900 XTX gets close to the 4080 but is still priced too high.
 
also if you are into Emulation (which is technically legal if ur dumping ur own stuff) its awful due to the limited bus and doom on yuzu lost about 20% performance vs last gens 3060(which has a full 256bus).
If you mean in 'price'...then what's the point of having both cards?
no every gpu generation has been "steps" between sku's.

a ti/super was around mid way between the higher & lower sku.

this generation has no step and nows theres massive gaps between them.
 
It's really tempting even though I don't need another gaming rig but I can buy an HP Omen 45L with the 13900KF and 4090 for $2800 plus tax on HP's website. It's just indicative of poor PC sales with a delivery date of 7 -10 days.
 
would still wait to see what the 7800xt brings ..

i see people complaining about the 7800xt uplift over the 6800xt but the series is all messed up now because Nvidia wasted AMD with the 4090 and tried to roll people with the 4080 12gb.

I see the 7800xt this gen as a 4060ti 16gb beater ..

Despite even if the 6800xt is on par with the 7800xt RT will be better on the 7800xt i think and lets not forget the 6800xt is slowly slowly running out of stock ..

I just bought the msi gaming trio z 6800xt for a friends build think it was one of if not the last 6800xt's at pc case gear in Aus had for sale the 6800xt red dragon is now discontinued so i think we are seeing the last of the older gen cards now !!

I look forward to see what the 7800xt VS 4060ti 16gb results are and the price point AMD place on the 7800xt !!
 
Last edited:
There is a place on the market for just about anything, as long as it is priced accordingly. 12GB/192bits was sort of the norm around the $330 MSRP with the previous generation, it should have at least stayed that way in this one.
sorry anything in 2023 8 gb is disgusting 12gb is borderline bad but pairing 16gb with a smaller bus size that nerfs it anyway at what 500- 600usd is not much better !!

Clearly Nvidia just doesnt care and must be laughing there FN arse off at customers buying this rubbish !!

what i dont get which totally stumps me is even if Nvidia get rolled on price with the 7800xt then dumps the price of the 4060 ti 16gb early specs show the 7800xt will still be the better buy !!
 
sorry anything in 2023 8 gb is disgusting 12gb is borderline bad but pairing 16gb with a smaller bus size that nerfs it anyway at what 500- 600usd is not much better !!
I said that almost everything has a place for the right price, not that the current prices make sense. The 4060Ti would likely have had very positive reviews if it had 12GB/192bits for $350 and the 8GB 4060 would have been fine at $250.
 
Nvidia made an absurdly greedy decision making the 4060 /4060 Ti 8gb and not 12 or 16. DLSS3 even eats VRAM and it's already been shown to cause problems at 1440p/4k to have it enabled in new releases. Nvidia doesn't want these GPU's to last. Upside is nice efficiency and they are decent GPU's but just a little extra vram and nvidia would have made everyone happy.
 
This is a massively hard card to recommend, like the 4060, unless it's the only thing you can get that checks your boxes.

The prices of the higher cards make them hard to recommend as well unless absolutely needed so I'm recommending a generation skip to my friends unless there's a need.

What a terrible generation of cards.
The 4060 is fine in it's price range without old GPUs over saturating the market.
 
Nvidia made an absurdly greedy decision making the 4060 /4060 Ti 8gb and not 12 or 16. DLSS3 even eats VRAM and it's already been shown to cause problems at 1440p/4k to have it enabled in new releases. Nvidia doesn't want these GPU's to last. Upside is nice efficiency and they are decent GPU's but just a little extra vram and nvidia would have made everyone happy.
I think Nvidia was thinking about 2 things. GPUs that are anti mining which is what people asked for. AMD cared nothing about mitigating that.

2nd they had to think ahead about how many Ai GPUs/stuff they would have to produce. You can't supply both a ton of ai stuff and gaming. Ai won of course.
Hence the 128bit bus chips which still perform pretty well for their size. The higher bus chips will be for AI. Tsmc can only do so much.
We are lucky we got what we got. People need to chill out and calm down. Business is business. Chip demand is still very high for many things. Gaming is on the side lines right now.
 
It's a niche card, but wherever a market is large enough, niches are, too.

I'm doing far more CUDA with my Nvidia cards than gaming: that's usually what my kids will do with them, after I'm done, so I keep an eye on both.

And with CUDA, especially with machine learning on CUDA, RAM is everything.

The most wonderful thing about an RTX 3090 or 4090 was those 24GB which let you fit the 30B Llama model on it, zero practical use in any game I know. And even the RTX 4090 still struggles with ARK Survival Evolved at 4k and eye candy settings (no DLSS support or similar is supported by the game), so 8K is out of question: 24GB is as much use for gaming as 128GB is on the CPU side, it's really about workstation use cases at PC prices. And from that perspective 16 core Ryzens and RTX x090 are crazy economical.

Too bad Meta chose not to release a 30B model for Llama 2, which means the 70B won't fit and for the 13B I don't need the big card.

If you think that this card is overpriced, try running a 70B model: 48 or 80GB or GPU RAM will cost you a kidney in today's market.

Even with 16GB of RAM you won't have a lot of fun running LLMs, because they need cores, too. But when it comes to doing feasability studies or if you're CS department has to make do with less, it can be the difference between looking on or being at least in a minor league game.

And then with some of the LLM models you can actually run them on multiple GPUs with only a PCIe fabric in-between, because they have been designed on vast scale-out GPU networks, where the fabric isn't always NVlink, either: they do much better than your own run-of-the-mill-and-out-of-RAM model.

For that case this 2.5 slot design is obviously no good. I've gone with PNY for both my RTX 4090, because that's a tripple slot where most are 3.5 and for my 4070, because that's dual slot design with the PNV Verto.

There are used EPYCs out on eBay which allow you to run a quad dual GPU setup for 48GB of VRAM, better than a single RTX 3090 or 4090 if the model isn't too highly connected.

The 4070 is only 12GB but would I pay premium for 24GB? Bet you, I would! Not for gaming, but for ML it's a no brainer!
Ditto the 4090: 48GB is a €8500 price gap between the RTX 4090 and the A100.

Of course it's HBM vs. GDDR6 and some other tiny details, but at say €2000 instead of €1500 for 48GB GDDR6X I'd jump very quickly and probably purchse two, beause that couldn't last.

Nvidia made an interesting decision not to close their consumer cards against ML, like they did for previous generations where they culled FP64 support to disable HPC [ab]use of consumer hardware.

Putting out a 16GB entry level model may well be a carefully aimed move to avoid AMD or even Intel gaining a lead at the entry level of ML.

And those who consciously choose this card precisely for its ML capabilities know they got off cheap--relatively speaking.
Have you tried 2x RTX 3090 in NVLink?

That's the setup I have (with Ryzen 5950X + 128GB RAM) but I haven't gotten Meta's Llama 2 running yet

Gotta use some of the few 2 slot sized 3090's, I went with EVGA XC3 models. They are about $800 each on eBay these days.
 
I think Nvidia was thinking about 2 things. GPUs that are anti mining which is what people asked for. AMD cared nothing about mitigating that.

2nd they had to think ahead about how many Ai GPUs/stuff they would have to produce. You can't supply both a ton of ai stuff and gaming. Ai won of course.
Hence the 128bit bus chips which still perform pretty well for their size. The higher bus chips will be for AI. Tsmc can only do so much.
We are lucky we got what we got. People need to chill out and calm down. Business is business. Chip demand is still very high for many things. Gaming is on the side lines right now.
I think it's more about the fact that external interfaces (meaning, the memory controller connections to the GDDR6/GDDR6X chips) do not scale really at all when shrinking process nodes. SRAM caches also don't scale that well. Logic scales really well, though. So, if 4N costs four times as much as 8N, and that's probably a conservative estimate, then each 64-bit memory interface plus cache costs four times as much as well.

This is part of why AD102 with six 64-bit interfaces is a 608 mm^2 chip, while AD103 with four 64-bit interfaces is a 379 mm^2 chip. Of course there's more than that. AD102 has 144 SMs, AD103 has 80 SMs. AD102 has more L2 cache and everything else as well. But most of the structures scale in terms of how many there are. So:

AD102 is 60.7% larger than AD103.
AD102 has has 80% more SMs.
AD102 has 50% more L2 cache and memory controllers.

With some rough die shot math, each 64-bit memory interface on AD102 takes up about 36 mm^2 of the total die (that's for the controller plus L2 cache). That means fully one third, give or take, of AD102 is devoted to memory controllers and L2 cache!

If AD103 had a 320-bit interface rather than 256-bit, it would have been more like 415 mm^2 in size — 9.5% larger. If AD104 had a 256-bit interface instead of 192-bit, it would have been more like 330 mm^2 (instead of 294 mm^2) — 12% larger. If AD106 had 192-bit instead of 128-bit, it would have been ~224 mm^2 instead of 188 mm^2 — 19% larger. So the lower down the stack, the more costly it becomes.

Could Nvidia have eaten those increased costs? Absolutely. I've noted that AMD's Navi 31 + MCDs basically ends up being similar in overall size to AD103 but still sells for less. But Nvidia decided to maximize profits instead.
 
There is a place on the market for just about anything, as long as it is priced accordingly. 12GB/192bits was sort of the norm around the $330 MSRP with the previous generation, it should have at least stayed that way in this one.
Or 16GB/256bits if you have an a770 - also $330 MSRP.
 
Have you tried 2x RTX 3090 in NVLink?

That's the setup I have (with Ryzen 5950X + 128GB RAM) but I haven't gotten Meta's Llama 2 running yet

Gotta use some of the few 2 slot sized 3090's, I went with EVGA XC3 models. They are about $800 each on eBay these days.
If I could start fresh, it sounds like a near ideal match to try with: Already have the 5950 with 128GB of ECC RAM, one RTX 4090 and another 3090 that I could temporarly wrangle back from one of my sons.

Of course I'd rather have Tom's hardware try the permutations, hits and misses between a dual 4090 setup with PCIe linking and dual 3090 with an NVlink, because I can't burn that sort of money that easily to get the hardware (and no one sends me review units).

I had started with a GTX1080ti, upgraded to RTX2080ti for the dense data types and tensors, then went for the 3090 relatively late for the 24GB of RAM and INT4, was really happy when it worked so well with the Lllama 30B and then upgraded to the 4090 for the extra tokens per seconds and because I could get a 3slot design, which would allow me to go double with a matching mainboard in the corp labs.

(BTW: I don't think any of these included NVlink cables, the last ones I remember seeing were from GTX 9080ti and I don't think those would work. No idea how hard those are to obtain.)

Dual slot 3090 seemed a pretty theoretical option, those who managed to get them, held on tight and it's only now that you hear about dual 4090 being done in some Chines back alleys.

So *hint*, *hint* if Tom's hardware is looking for a nice piece of non-ChatGPT4 editorial content, some experimentation with running 70B Llama2 on dual/tripple/quad consumer GPUs should be interesting reading.

And I've heard that slot extension cables e.g. used to change mount directions for "RGB" enthusiasts, even work at PCIe 4.0 x16, so with a bit of crazy open chassis testing, even oversized consumer GPUs could be used for some of this experimentation.

Actually, I'm pretty sure I could get paid by my employer for spending the time on this and I'd only need tom's hardware to send me the hardware... but I'm based in Europe, so there is logistics, among others.

Then again LLMs are reported to really require bandwidth for inferencing so the gap between 1TB/s on the RTX 4090 GDDR6 and 1.6TB on the A100 HBM2 is a hard limit on the token rate, even without the overhead of split models.

For functional testing in the labs (which is my use case), the first may not too big of an issue, for running inference at an acceptable mix of cost and performance it might actually force you towards the data center GPUs: it's for good reasons Nvidia builds and prices them differently and not everybody is rushing to consumer GPUs to run LLMs... we could just be behind on the data.