Except that furmark is intentionally overutilzing the card to push heat to maximum levels across the board.This isn't a good argument in favor of Nvidia.
I just checked FurMark, letting it run for 30 minutes or so on the 3090 FE, at stock. GDDR6X hit 104C max before settling down to 102C. The card and drivers are fully capable of monitoring these things, so if the memory hits 110C, it's because the card isn't throttling soon enough. FurMark is probably specifically detected, or maybe it's just lighter than mining. Either way, ETH mining at stock settings causes the fan speed to ramp up to 100% after a bit. That's terrible! If the GPU core is at 60C and the memory is at 110C, and the card knows the memory is overheating, it should clock down before that point. Unless Nvidia only considers 110C and above the throttle point, which appears to be the case.
Having checked one more GPU (Colorful 3080 Vulcan), memory temps are much lower (96C max) while mining. I suspect the card just has better cooling contacts on the memory, so keeping the GDDR6X cool can be done. But the first wave of cards, including the Founders Edition, apparently didn't put enough effort into memory cooling. Thermal pads were okay at 14Gbps GDDR6. 19/19.5Gbps GDDR6X? Probably need to go to thermal paste and direct heatsink contact without the millimeter thick pads.
Right now, what I've seen (with a very limited number of cards) is that the 3080/3090 Founders Edition cards are probably the worst of the bunch as far as memory cooling goes. Several other 'typical' card designs also have high memory temps. The fix -- which should arguably be done in drivers, because VBIOS updates are extremely rare -- is pretty simple. Something like:
1 Second Timer:
if (GDDR6XTemp > 100C): DropClocks5Percent
Repeat.
Obviously it would need to track current clocks and previous clocks to see if temps are trending down, but the theory is that the GPU would limit clocks earlier. In fact, I'm sure the cards already do something like this, except it's at 110C right now. So change that temperature to 100C -- or issue a statement that 110C is "perfectly fine".
Nvidia, 10 years ago posted this:
"Furmark is an application designed to stress the GPU by maximizing power draw well beyond any real world application or game."
So a result of 104c in Furmark, of all things, is certainly a non-issue. In the GPU market for laptops the same is true- it can kill boards fairly easily despite its nature as a stress test. I would not recommend anyone use that software even for a stress test. There are far more realistic and better utilizing apps out there for one, for two, there is a good amount of warning about that from the manufacturer that the use of such an app would void the warranty- this again is cut down to another use of the card outside its intended purpose.
I've been on both sides of the GPU world in relation to consumers and professional use, for over 10 years.
These numbers would be an issue if it were simply a game, but the tests you've put forth are highly suspect because of their usage of GPU resources to literally run the card at intensities it wasn't designed for. If that is the case, it is not a failure on the part of Nvidia for that reason, but the person putting the card on a load that isn't in line for the use of the card.
It is essentially no different than someone putting their card in the oven, and expecting a refund when they burn the board.
To clarify I am not saying that this is not an issue, but I am saying it is outside the expected use for the card. This means that the burden of improving cooling is on the consumer, not Nvidia, as the application of the card in Furmark/ethereum mining is outside the normal usage of the card on an everyday basis, and as such is not guaranteed to function within limits during that use of the application
Last edited: