Question Peculiar problem with RTX 3090 ?

Status
Not open for further replies.

Trebiane

Distinguished
May 20, 2013
1
0
18,510
So I have a very peculiar problem with my system, more specifically with the GeForce RTX 3090...

Specs: i7 9700K, 32GB RAM, RTX 3090, 970 EVO PLUS SSD, bunch of LL120s with a couple of Corsair's fan controllers, a SATA SSD, 3 HDDs, H110i SE 240mm Liquid Cooler.

I was one of the lucky ones to snag a 3090 FE at MSRP and have been gaming with it without problems until about two months ago (could be even before that, I'm not exactly sure). I've started getting BSODs (mostly whea_uncorrectible_error) but unfortunately none of these BSODs would register as a BSOD (the computer would show the BSOD for a split second or not at all before rebooting) or at least that's my theory, because the memory.dmp would never show these crashes when I checked them with the BlueScreenView app.

CPU temps only go up to 80-90 degrees during the toughest of the torture tests (and when the fan profile was set to quiet), GPU temps never exceed 70 degrees. (Memory junction is another story unfortunately, but as far as I can tell, it's around 80-90 when the crashes happen.)

Furmark, Prime95 would run fine for up to an hour (I've never had the patience to check longer because whenever something went wrong it happened within 20 minutes max.), but OCCT power test caused similar BSOD crashes and the PC wasn't able to pass any of the 3D Mark tests. So I started testing different components. Like I said CPU would complete benchmarks fine, swapped out RAM sticks and the problem was still there, but than I swapped my 3090 for a 2080 Ti and voila, no more crashes, everything was fine, could play all my games.

Before swapping the GPUs I checked the event viewer and all the crashes would happen after "File System Filter 'npsvctrig' (10.0, ‎2025‎-‎01‎-‎06T05:41:12.000000000Z) has successfully loaded and registered with Filter Manager." event happens. There's a lengthy thread about this somewhere but it unfortunately doesn't have any final solutions to the problem. Most suggestions I've read up to this point suggested a potential PSU problem, which made sense as the 2080 Ti uses less power than the 3090, so I suspected (also didn't want to think the problem lies with my 3090). Also my RM1000 was 7 years old at this point so I decided what the heck and ordered a 850 Watt power supply.

At first everything is fine and the problem looks like it's gone away. Previously failed 3DMark stress tests (FireStrike, Port Royal, etc.) are scoring 98-99%. One of the most problematic titles, Microsoft Flight Simulator is running smoothly. But after a few days, bam, the PC crashes again and after that things start happening more frequently as before. The only difference as opposed to the previous PSU, is that the PC can now pass the various 3DMark stress tests. But it crashes after 20-30 mins on Doom, after 10 seconds on Cyberpunk 2077 and only runs titles like Stardew Valley and Hades without any issues.

I turn towards the OCCT once more, power test, either crashes within 20 minutes or starts logging CPU errors, CPU and Memtests are fine, shader test is fine, VRAM test starts logging insane number of errors after a certain point, in the billions, but doesn't register any errors in the next run. I am about the go crazy at this point. So still not wanting to think that my 3090 is the problem, I borrow a 3090 from one of my friends, same exact FE model and having tested the system with the 2080 Ti the same day, with great results once again, the second 3090 also causes the exact same errors!!!!!

Since my buddy didn't have any problems with his 3090, I am now led to believe that even the 850 watt PSU is struggling with the 3090, which is insane and definitely shouldn't be any problem as even the wildest estimations calculate around 725 Watt power draw for my system. Oh, I haven't formatted the system yet, I hate formats, but this honestly feels like a hardware issue as all the BSODs I get are wheas.

So TL😀R --> Started having crashes in graphically demanding games, the more demanding the game, the sooner the crash occurred. Dropped down to trusty old 2080 Ti, everything was fine. Switched PSUs, went back to 3090, problems started happening again, switched back to 2080 Ti, everything was fine. Switched to a DIFFERENT 3090 and crashes came back. No OC whatsoever, I even turned off the XMP profile. BIOS and all the other drivers are updated. DDUd multiple times. Tried all your TROUBLESHOOTING 101s, except formatting.
 
Welcome to the forums, newcomer!

With all things said and done, you forgot to mention the specs to your entire build. The part that does come of most importance is the make and model of the PSU that you swapped out to. 850W is the wattage, it doesn't state the quality of the unit, hence why we ask for the make and model of the unit. As for your GPU drivers, did you use DDU to rid your older drivers prior to every time you swapped out GPU's? Make and model of your motherboard and the BIOS version for said motherboard? Version of Windows 10? ideally you should be on version 21H1. Driver version used for the GPU?
 
@Trebiane , any progress or solution to your 3090 issue? Seeing similar problems here, after 6 months of stable use of my 3090. Full story below:

18 months ago I built my new PC. and 6 months ago I installed this 3090 Suprim X graphics card, upgrading from a Strix 1080Ti. Full system specs at the end. No stability issues whatsoever until a few weeks ago where I began to get an increasing number of BSODs within Windows 10 Pro. The BSODs were all 'WHEA Uncorrectable Error' apart from one 'Clock Watchdog Timeout'. Quite often, instead of a BSOD the system will simply hang, with everything frozen, not able to move the mouse cursor etc, only way to recover from this system hang is a hard reset. The length of time from starting the system to experiencing either a BSOD or a hang has been reducing, and has gone from one every 24 hours to a BSOD or hang within 5 minutes. On occasion I have experienced a hang within the motherboard BIOS, so this might rule out any Windows or driver related issue.

The stability issues seem unrelated to system load - having had hangs within the motherboard BIOS, and BSODs with idle desktop scenarios, just web browser use and during stress testing the CPU and GPU. Things I have tried include:

turning off C_State settings in the BIOS,
'High Performance' within Windows 10 Pro power management settings,
Default Motherboard BIOS settings
Memory XMP off and on.
Removing any overclocks
Reducing the active CPU Cores to 1
Disabling Hyperthreading
Using both the 'Gaming' and 'Silent' BIOS selectors on the graphics card
Reinstalled Windows 10
Removed all unnecessary peripherals
Single Rail and Multi Rail configuration on the PSU
Tried with Re-sizable BAR support on and off

System temperatures are all normal, as seen under HWMonitor. I'm running the latest motherboard and VBios (for resize bar support). Again, system was previously stable for 6 months.

When I replaced this 3090 Suprim X with my old 1080Ti the system returns to complete stability. When I put this Suprim X 3090 into my old PC (Core i5-2400K, Asus P8 P67 Pro motherboard) the graphics card will not post - ie no video output. Instead, on the motherboard itself a red LED illuminates (VGA_LED) indicating a problem with the graphics card. The rest of the system boots normally, and I can connect to and use the system remotely. Again, putting the 1080Ti into this old system, all is fine. There is a possibility of a motherboard/BIOS/3090 incompatibility perhaps, given this motherboard is probably 10 years old, but meeting the PCI 2.0 spec, and sufficiently powered, I'd expect the system to post and boot properly at least.

At the moment I'm leaning towards seeking a replacement for this Suprim X 3090 graphics card. I'm open to suggestions. Is there anything else that can be tried? For me, since the system was stable, and has become increasingly unstable, and as the 1080Ti works fine, I can only suspect the graphics card itself, unfortunately.

Full system specs below:

Intel Core i9-9900KS 4 GHz 8-Core Processor
EKWB P280 Custom Water Cooling Loop Kit (For the CPU)
MSI MEG Z390 ACE ATX LGA1151 Motherboard
GeForce RTX 3090 SUPRIM X 24G Graphics Card
G.Skill Trident Z RGB 32 GB (2 x 16 GB) DDR4-4000 CL19 Memory
Samsung 970 EVO Plus 2 TB M.2-2280 NVME Solid State Drive
be quiet! Dark Base Pro 900 Rev. 2 ATX Full Tower Case
be quiet! Dark Power Pro 11 1000 W 80+ Platinum Certified Semi-modular ATX Power Supply
Microsoft Windows 10 Pro OEM 64-bit

Any suggestions / advice / thoughts are welcome.

Thanks.
 
Whatever weird problems we get, did you try DDU with safe boot and clean all the drivers? Then reinstall the downloaded drivers without internet. (make sure you have the internet unplugged when deinstalling and installing)
 
Status
Not open for further replies.