PC Part list
Nvidia Geforce 3090
Palit Game Rock OC
Corsair RM1000x Shift (Brand new)
Ryzen 7 5800x32gb
Trident Z 3200 ddr4
MSI x470 Gaming Plus
Problem description
I just got this card yesterday from a friend. I knew it had issues but it has been RMA'd with claims that it's working perfectly from vendor.
When the card is under heavy load all screens turn black, fans spike up to what sounds like 120% and everything is stuck until I hard reboot. I can still hear sounds from the desktop etc. No BSOD. The crash seems inevitable when load is at 100%, and mostly happens within 1 minute of it. Under desktop use and light gaming it runs perfectly fine.
The GPU is powered by 3x 4+4pin, straight cables -- no splitting ones. All cables are brand new, so is the PSU.
GPU was RMA'd to where it was bought, but they sent it back claiming it passed the PCmark stress test or something. Can't remember the benchmark used.
Before this it was tested in another set up where it didn't even provide an image at all.
I've noticed a couple of things that looks concerning (All tests done running Kombustor):
What "works" so far
I've managed to make the card run stable at 65% Power with 100% fan speed through MSI afterburner. This leaves the temperatures hovering for core at 76c and hotspot at 97c. The core then runs at about 1300 Mhz. It says maximum has been up in 1800Mhz but I never saw it hover there.
The GPU is then using about 250w. It also works when I OC Core by +100mhz and Memory by +500. I haven't bothered going higher since this seems like too much of a bandaid fix.
If I increase it to 70% power the core spikes to 82c and hotspot to 105c. It still "survives" for a minute or two compared to 100% which is down to seconds. This does feel like slightly above maximum though.
I feel like it's obvious it's a temperature issue, but how can it be this bad? Does the cooling paste have to be reapplied? Is there some other issue I might not be aware of? BIOS settings, some kind of firmware?
I see this has been a widespread issue in the past and I've tried several other fixes including:
Nvidia Geforce 3090
Palit Game Rock OC
Corsair RM1000x Shift (Brand new)
Ryzen 7 5800x32gb
Trident Z 3200 ddr4
MSI x470 Gaming Plus
Problem description
I just got this card yesterday from a friend. I knew it had issues but it has been RMA'd with claims that it's working perfectly from vendor.
When the card is under heavy load all screens turn black, fans spike up to what sounds like 120% and everything is stuck until I hard reboot. I can still hear sounds from the desktop etc. No BSOD. The crash seems inevitable when load is at 100%, and mostly happens within 1 minute of it. Under desktop use and light gaming it runs perfectly fine.
The GPU is powered by 3x 4+4pin, straight cables -- no splitting ones. All cables are brand new, so is the PSU.
GPU was RMA'd to where it was bought, but they sent it back claiming it passed the PCmark stress test or something. Can't remember the benchmark used.
Before this it was tested in another set up where it didn't even provide an image at all.
I've noticed a couple of things that looks concerning (All tests done running Kombustor):
- GPU Core temp is at 82c, but hot spot temperature is hovering at 105c (maybe spiking higher, hard to tell)
- The Windows event log spews hundreds of nvlddmkm errors every second when the crash occurs. Lots of different ID 13 errors and even more ID 0 errors. Some of these include:
- \Device\Video3Graphics SM Global Exception on (GPC 4, TPC 3, SM 0): Multiple Warp Errors
- \Device\Video3Variable String too Large
- \Device\Video3Graphics Exception: ESR 0x525e14=0xffffffff 0x525e10=0xffffffff
- \Device\Video3Graphics Exception on GPC 0 ZROP 0: Graphics is hung, FATAL!!
What "works" so far
I've managed to make the card run stable at 65% Power with 100% fan speed through MSI afterburner. This leaves the temperatures hovering for core at 76c and hotspot at 97c. The core then runs at about 1300 Mhz. It says maximum has been up in 1800Mhz but I never saw it hover there.
The GPU is then using about 250w. It also works when I OC Core by +100mhz and Memory by +500. I haven't bothered going higher since this seems like too much of a bandaid fix.
If I increase it to 70% power the core spikes to 82c and hotspot to 105c. It still "survives" for a minute or two compared to 100% which is down to seconds. This does feel like slightly above maximum though.
I feel like it's obvious it's a temperature issue, but how can it be this bad? Does the cooling paste have to be reapplied? Is there some other issue I might not be aware of? BIOS settings, some kind of firmware?
I see this has been a widespread issue in the past and I've tried several other fixes including:
- Reseating GPU
- Reseating and switching up power cables (also in the PSU outlet)
- BIOS update
- DDU fresh driver
- Various power settings in Windows Power Plan