• Happy holidays, folks! Thanks to each and every one of you for being part of the Tom's Hardware community!

[SOLVED] Aorus RTX 3090 Xtreme Waterforce WB runs benchmark for 4 hours without crash, but then crashes 5 minuets into a game.

Status
Not open for further replies.

OUTBURSTPAL

Prominent
Apr 23, 2021
63
4
545
So I have had my Aorus RTX 3090 Xtreme Waterforce WB for about 4 moths now. In game, weather it's 5 minuets, to an hour or 2, I will randomly get a black screen but the audio will stay on and the PC is still technically responsive. (Example discord still works and I can still communicate with my friends on a call.). I'm am about to have a mental breakdown because I am sick of this. I have been battling this for months now and have tried almost every possible fix. Here is what I have done:

Clean video driver install with DDU
Undervolt: (Made it way more stable but could still occur, but then i started to artifact so i removed it)
Yes i am using 2 separate cables from PSU (I am also using cable extensions)
Have updated motherboard BIOS and GPU vBIOS to latest

There is alot more but i can't think right now because of how upset i am but when i think i will update this post


I can play a game like Left 4 dead 2 which is totally fine since it runs at 0.743v.

So i ran every 3dmark stress test and benchmark for about 4 hours which is pushing the GPU usage to the max and it never crashed. But just then i tried to play Phasmophobia for 5 mins and it crashed. How does this even make sense in the slightest?

My PSU is the Corsair RM850. Also my RAM is currently not in dual channel mode because when i bought the RAM i didn't realise it wasn't on the QVL for the motherboard so i was having problems with it in dual channel mode but resolved when i put the 2 sticks in the last 2 slots. The RAM is the Team T-Create 64gb (2x 32gb) 3200mhz. I have it like that until i get new RAM. I am saying this because i feel this may be the issue but i highly doubt it since this problem seems to be very common with Aorus cards.

This is the error i am getting in reliability history when it crashes: LKD_0x141_Tdr:6_IMAGE_nvlddmkm.sys_Ampere_3D

Also when i crashed the last time i was logging with HWINFO64. I will attach the log file here: https://drive.google.com/file/d/1rI-kAgtzMBTkyV6CT1V9V6qCpK5-x5sC/view?usp=sharing

Please anyone help me. I am so upset. All i wanted is to be able to play games. I can't even do that. I have invested so much money in this PC.
 
Solution
Is it only Phasmophobia and L4D? Might be a bit difficult to determine the cause with just 2 games.


So i ran every 3dmark stress test and benchmark for about 4 hours which is pushing the GPU usage to the max and it never crashed. But just then i tried to play Phasmophobia for 5 mins and it crashed. How does this even make sense in the slightest?
At the end of the day, 3D Mark is still synthetic and is not the gpu stress test of all gpu stress tests. That it doesn't crash during hours of 3D Mark, or the likes or Unigine Heaven/Superposition, but taps out within minutes in actual games isn't unusual.
Games can be better stress tests than these other stress tests.

Oh, also, there's the possibility that the factory OC a card like...
Is it only Phasmophobia and L4D? Might be a bit difficult to determine the cause with just 2 games.


So i ran every 3dmark stress test and benchmark for about 4 hours which is pushing the GPU usage to the max and it never crashed. But just then i tried to play Phasmophobia for 5 mins and it crashed. How does this even make sense in the slightest?
At the end of the day, 3D Mark is still synthetic and is not the gpu stress test of all gpu stress tests. That it doesn't crash during hours of 3D Mark, or the likes or Unigine Heaven/Superposition, but taps out within minutes in actual games isn't unusual.
Games can be better stress tests than these other stress tests.

Oh, also, there's the possibility that the factory OC a card like this came with might not be stable, so you'll have to test with underclocking the default profile.

My PSU is the Corsair RM850.
The RM-nonX models have been found to not handle the transient power spikes from the high end Ampere cards as well as the RMX ones.
I would add the psu being a possibility to your checklist.

Interesting that Gpu Memory Junction and Gpu Hot Spot Temp show 0C after some time. Must be when the crash happens, but all the other gpu readings are normal.
You didn't OC the Vram, did you? I believe factory OCs only touch the gpu core clock.


That's what I notice. Hopefully there will be input from others, or someone who's already experienced this.
 
  • Like
Reactions: Jmi20 and dotas1
Solution
@Phaaze88 thanks for this. And no it’s not just those 2 games, it’s literally any game. L4d2 seems to be the only game that I can play. I am leaning more towards PSU, because it is most definitely high voltage spikes that are causing the crash. I actually remember that it did ONCE crash in l4d2 when I was loading into a match. I notice that in l4d2 as soon as you load into a game the gpu loads go through the roof for a few seconds. So it totally seems like power spikes are to blame. After hearing this are you also thinking it’s a PSU issue? Also I did not OC the vram. Thank you so much again, reading this
The RM-nonX models have been found to not handle the transient power spikes from the high end Ampere cards as well as the RMX ones.
calmed me down a bit lol. I had no idea this was a known issue with those PSUs.
 
@Phaaze88 but now that I remember, about a week ago I was playing dead island voltage locked @1v 1985mhz (was running at 1965mhz) And was monitoring voltage and speed with msi afterburner overlay. I was looking at the overlay, I clearly remember it was staying flat at that speed and voltage, no sudden spikes, and it still black screened. That was about 15 minutes in. I then restarted and could play for an hour and a half fine. So now I’m not too sure about the voltage spike theory. It seems whenever I have a theory and think I have found the issue another thing proves otherwise. Really annoying. What are your thoughts now? I also remembered a lot of people having this issue with Aorus cards and upgrading PSUs didn’t fix it. Some also RMAd and upgraded PSUs at the same time and none fixed this issue. But then I know someone with the same card who has 0 issues.
 
Go overkill, run it at 1800Mhz and downclock the memory a few notches, and keep the voltage reasonable. If it doesn't crash then, it must just be slightly unstable at or near factory settings.

Reports of memory on the backside of the card overheating is common enough, point a fan at the back of the card.

Also the capacitors used on the back of the GPU have been known to cause issues with high GPU clock frequencies. More of an advanced fix, but new vBIOS were released to take care of this problem.

(And there is also the solution of replacing the large SPCAPs with MLCC arrays, the solder pads for it should be under there, all in parallel anyway) Not really a recommendation though.
 
@Eximo Hey, I think you may be right that it is high clock frequencies causing the crash. Because I remember I undervolted to 900v @1785mhz and never crashed. I then removed the undervolt as I started to artifact. But for some odd reason I never actually just underclocked the clock speed itself. I’m going to try that, because if I am correct, high voltage spikes usually don’t cause a crash, but rather cause a temporary performance drop if I’m not mistaken? I think it is probably the clock speed. If this gets sorted, thank you guys.
 
Hey Guys!
Just now joined the forum to jump in on this discussion. I have been dealing with 100% completely identical issues as you outburstpal. Feels good to find someone with the same exact card and same exact issues. I have been troubleshooting this since February of this year .

PC Specs...
Mobo- Asus ROG Crosshair VIII Formula
CPU- Ryzen 5950x
Memory- (2x) Corsair Vengeance RGB Pro 16GB 3600
PSU- Corsair HX1000
M.2- WD Black 1TB SN850 NVMe
GPU- Gigabyte xtreme waterforce 3090

What I have tried....
-swapped 850W psu for the current one
-ddu fresh install of driver
-ddu rolled back driver to latest one available on gigabytes website. This version was significantly behind the one nvidia offered but tried it anyways.
-updated firmware in GPU With app from gigabyte
-ran pcie slot in gen3 mode thinking the riser cable could be the issue.
- removed power cable extensions
-removed riser cable and directly plugged card into mobo
-logged all sensors using HWINFO and temps and power see nothing out of the ordinary.(can post file if needed)
-plus most things you have done that I did not list. I also see no issue in benchmark programs.

An interesting thing I would like add, when looking at sensor data on the log, you can see when the black screen happens because the GPU stops updating sensor data while the cpu still sends fresh data. Meaning the GPU data flatlines while CPU continues. I have also gotten a few reports in windows reliability monitor that Desktop Window Manager crashed. If you look in task manager, you will see this program runs on GPU. So it seems the GPU loses communication(crashes) and then causes Desktop manager to crash then gives black screen....? The other unfortunate thing is that nothing shows up in the event viewer except for the "improper shutdown" from me having to hard reset.

I was just about to give up and reach out to gigabyte, and happened to stumble on this post.
Tomorrow I will try lowering the default clock and play some games. The problem is it is very random just like you said. it can happen in five minutes of a game or I can go a week without it happening. So it may be a while before I can confirm that fixes the issue.

Please let me know if you figured anything out or if anyone on here needs anymore information. Hopefully this helped someone else and we can also figure this out together. A warranty or RMA process just seems like a nightmare during times like this.

Thank you guys for the help so far!
 
Just wanted to give an update...

I turned down my core clock speed 120Mhz from factory and got a black screen about an hour into gamming yesterday.
I am going to try and turn it down some more as well as some on the memory clock. Will report back with the results.

It is starting to look like I may have to RMA 🙁 .....

Thanks again for any help.
 
Wait are you saying you turned it down to 120mhz or did you mean 1200?? I capped mine at 1695 and it seems to be okay (ALTHOUGH I HAVEN'T PLAYED MANY GAMES SINCE THEN SO IT COULD STILL POSSIBLY HAPPEN) but I did play a fairly demanding game afterwards and it seemed to be fine. have you tried anything else?
 
Sorry for delayed response!

I use afterburner. I had it set just under 1600 and still getting crashes.
Have you done anything for memory?
Did you also raise power?

I havent tried anything else... starting to run out of ideas...
 
Sorry for delayed response!

I use afterburner. I had it set just under 1600 and still getting crashes.
Have you done anything for memory?
Did you also raise power?

I havent tried anything else... starting to run out of ideas...
So I am coming back to this about 10 minuets after having a crash. Hadn't crashed in a while. Even had my underclock set. Now I am lost. This issue is just too common with these damn cards to be a "Hardware fault". We really need to get up Gigabyte's backside about this. I'm just about to contact them. And i swear if they just tell me to "RMA" I am gonna loose my mind. I will not RMA this card and i will wait for Gigabyte to fix their issue. And I did lower the memory clock a little bit, but clearly did nothing. I am so angry right now.
 
Last edited:
Sorry for delayed response!

I use afterburner. I had it set just under 1600 and still getting crashes.
Have you done anything for memory?
Did you also raise power?

I havent tried anything else... starting to run out of ideas...
BTW do you use XMP on your RAM? I heard that bad/no XMP profiles can cause video related crashes. Maybe these cards don't like no XMP or something idk lol because I know I am not using XMP since my RAM is not on my motherboard's QVL and I can't run it in dual channel nor can I use XMP because it causes my system to malfunction in many different ways. (I AM UPGRADING RAM SOON SO I'LL LET YOU KNOW IF THIS FIXES MY ISSUE). I have heard people swap out PSUs and all the jazz, and still have the same issue, but then they let their friends test it on their systems and it's totally fine. Chances are it is our RAM. Well, at least i hope so.
 
Last edited:
BTW do you use XMP on your RAM? I heard that bad/no XMP profiles can cause video related crashes. Maybe these cards don't like no XMP or something idk lol because I know I am not using XMP since my RAM is not on my motherboard's QVL and I can't run it in dual channel nor can I use XMP because it causes my system to malfunction in many different ways. (I AM UPGRADING RAM SOON SO I'LL LET YOU KNOW IF THIS FIXES MY ISSUE). I have heard people swap out PSUs and all the jazz, and still have the same issue, but then they let their friends test it on their systems and it's totally fine. Chances are it is our RAM. Well, at least i hope so.

I do currently have xmp enabled and running at advertised speed. I guess i forgot to mention, that is something else ive tested. Ive had crashes while xmp enabled and disabled.
But, for testing i will swap my memory with a friends and see if that does anything.

Will report back in a couple days!
 
Status
Not open for further replies.