[SOLVED] RTX 3080 Ti having intermittent issues

Mar 30, 2022
3
0
10
Specs:
OSWindows 10 Pro
CPUI7-10700K @ 3.8GHz
MotherboardROG STRIX Z490-F Gaming
GPUGigabyte RTX3080 Ti (GV-N308TVISION)
PSUbe quiet! Dark Power Pro 11 850W, BN653
RAM32Gb(2 sticks) of G-Skill F4-3600C16-16GVKC
NICTG-3468
Monitors2x Viotek GN24C
StorageSanDisk SDSSDH3 1T00
SanDisk SDSSDH3 500G
ST2000LX001-1RG174
WDC WDS200T2B0C-00PXH0

Most parts were purchased in November of 2020, with the exception of the GPU which was purchased in October of 2021, and the NIC, which was purchased February 2022.

I have not done any overclocking with these parts, although they may have factory overclocking.
I am running two identical monitors, both connected via display port to the rear of the GPU. Both monitors are running at 144Hz, and windows and Nvidia settings have been set to match.
Issues mostly, but not exclusively, affect whichever monitor is set as monitor one in the OS. I have swapped and replaced cables several times, issues persisting.
GPU is running driver version 30.0.15.1215

Issues, in order of appearance:
  1. (Starting on purchase of GPU) Horizontal lines at top and bottom of primary monitor on and shortly after boot/wakeup. These are not my pictures, but show the issue: Example. Updated drivers, turned off power saving at forums suggestion, Issue persists.
  2. (Starting approximately 1 month ago) Vertical bar in center of primary monitor on and shortly after boot/wakeup. A more recent addition to the horizontal lines.
  3. (About a week ago) One time occurrence of checkerboard artifacting. Computer behaved normally for several hours up until a game was started. Checkerboard mostly on primary screen, but also occurred in upper left hand corner of secondary screen. Was only visible in game, disappeared when alt tabbing to desktop. Screenshot of Issue. Updated drivers, restarted computer, Issue has not reoccurred.
  4. (Today) Issues with power, In order:
    1. Power button pressed to turn computer on, no fans, lights, or other indication of power. Waited ~5 minutes
    2. Power button pressed again, "Shutting Down" screen briefly shown on primary monitor, along with artifacts mentioned in issues 1 and 2
    3. Power button pressed several times with a pause between each attempt, no sign of life from computer.
    4. PSU master power switch turned off, Waited ~5 minutes
    5. PSU master power switch turned on, computer boots without input from case power button. Seems to be working normally.

The computer is currently working normally, but I am not sure for how much longer, or if it will turn on if I turn it off in the future.
What steps can I take to narrow down the issue? Is there a NVIDIA diagnostics program available? I have run the OCCT stress test, but usually get impatient and stop it after 10 minutes or so.

Thanks for the help.
 
Solution
If there are any steps listed here that you have not already done, it would be advisable to do so if for no other reason than to be able to say you've already done it and eliminate that possibility.



First,

Make sure your motherboard has the MOST recent BIOS version installed. If it does not, then update. This solves a high number of issues even in cases where the release that is newer than yours makes no mention of improving graphics card or other hardware compatibility. They do not list every change they have made when they post a new BIOS release. In cases where you DO already have the latest BIOS version, simply resetting the BIOS as follows has a fairly high percentage chance of effecting a positive...
If there are any steps listed here that you have not already done, it would be advisable to do so if for no other reason than to be able to say you've already done it and eliminate that possibility.



First,

Make sure your motherboard has the MOST recent BIOS version installed. If it does not, then update. This solves a high number of issues even in cases where the release that is newer than yours makes no mention of improving graphics card or other hardware compatibility. They do not list every change they have made when they post a new BIOS release. In cases where you DO already have the latest BIOS version, simply resetting the BIOS as follows has a fairly high percentage chance of effecting a positive change in some cases so it is ALWAYS worth TRYING, at the very least.


BIOS Hard Reset procedure

Power off the unit, switch the PSU off and unplug the PSU cord from either the wall or the power supply.

Remove the motherboard CMOS battery for about three to five minutes. In some cases it may be necessary to remove the graphics card to access the CMOS battery.

During that five minutes while the CMOS battery is out of the motherboard, press the power button on the case, continuously, for 15-30 seconds, in order to deplete any residual charge that might be present in the CMOS circuit. After the five minutes is up, reinstall the CMOS battery making sure to insert it with the correct side up just as it came out.

If you had to remove the graphics card you can now reinstall it, but remember to reconnect your power cables if there were any attached to it as well as your display cable.

Now, plug the power supply cable back in, switch the PSU back on and power up the system. It should display the POST screen and the options to enter CMOS/BIOS setup. Enter the bios setup program and reconfigure the boot settings for either the Windows boot manager or for legacy systems, the drive your OS is installed on if necessary.

Save settings and exit. If the system will POST and boot then you can move forward from there including going back into the bios and configuring any other custom settings you may need to configure such as Memory XMP, A-XMP or D.O.C.P profile settings, custom fan profile settings or other specific settings you may have previously had configured that were wiped out by resetting the CMOS.

In some cases it may be necessary when you go into the BIOS after a reset, to load the Optimal default or Default values and then save settings, to actually get the hardware tables to reset in the boot manager.

It is probably also worth mentioning that for anything that might require an attempt to DO a hard reset in the first place, IF the problem is related to a lack of video signal, it is a GOOD IDEA to try a different type of display as many systems will not work properly for some reason with displayport configurations. It is worth trying HDMI if you are having no display or lack of visual ability to enter the BIOS, or no signal messages.

Trying a different monitor as well, if possible, is also a good idea if there is a lack of display. It happens.


Second,

Go to the product page for your motherboard on the manufacturer website. Download and install the latest driver versions for the chipset, storage controllers, audio and network adapters. Do not skip installing a newer driver just because you think it is not relevant to the problem you are having. The drivers for one device can often affect ALL other devices and a questionable driver release can cause instability in the OS itself. They don't release new drivers just for fun. If there is a new driver release for a component, there is a good reason for it. The same goes for BIOS updates. When it comes to the chipset drivers, if your motherboard manufacturer lists a chipset driver that is newer than what the chipset developer (Intel or AMD, for our purposes) lists, then use that one. If Intel (Or AMD) shows a chipset driver version that is newer than what is available from the motherboard product page, then use that one. Always use the newest chipset driver that you can get and always use ONLY the chipset drivers available from either the motherboard manufacturer, AMD or Intel.


IF you have other hardware installed or attached to the system that are not a part of the systems covered by the motherboard drivers, then go to the support page for THAT component and check to see if there are newer drivers available for that as well. If there are, install them.


Third,

Make sure your memory is running at the correct advertised speed in the BIOS. This may require that you set the memory to run at the XMP profile settings. Also, make sure you have the memory installed in the correct slots and that they are running in dual channel which you can check by installing CPU-Z and checking the Memory and SPD tabs. For all modern motherboards that are dual channel memory architectures, from the last ten years at least, if you have two sticks installed they should be in the A2 (Called DDR4_1 on some boards) or B2 (Called DDR4_2 on some boards) which are ALWAYS the SECOND and FOURTH slots over from the CPU socket, counting TOWARDS the edge of the motherboard EXCEPT on boards that only have two memory slots total. In that case, if you have two modules it's not rocket science, but if you have only one, then install it in the A1 or DDR4_1 slot.



Fourth (And often tied for most important along with an up-to-date motherboard BIOS),

A clean install of the graphics card drivers. Regardless of whether you "already installed the newest drivers" for your graphics card or not, it is OFTEN a good idea to do a CLEAN install of the graphics card drivers. Just installing over the old drivers OR trying to use what Nvidia and AMD consider a clean install is not good enough and does not usually give the same result as using the Display Driver Uninstaller utility. This has a very high success rate and is always worth a shot.


If you have had both Nvidia and AMD cards installed at any point on that operating system then you will want to run the DDU twice. Once for the old card drivers (ie, Nvidia or AMD) and again for the currently installed graphics card drivers (ie, AMD or Nvidia). So if you had an Nvidia card at some point in the past, run it first for Nvidia and then after that is complete, run it again for AMD if you currently have an AMD card installed.



And last, but not least, if you have never done a CLEAN install of Windows, or have upgraded from an older version to Windows 10, or have been through several spring or fall major Windows updates, it might be a very good idea to consider doing a clean install of Windows if none of these other solutions has helped. IF you are using a Windows installation from a previous system and you didn't do a clean install of Windows after building the new system, then it's 99.99% likely that you NEED to do a CLEAN install before trying any other solutions.


How to do a CLEAN installation of Windows 10, the RIGHT way
 
Solution
Thanks for the suggestions.
I'm mostly just posting so that if issues reoccur I have a record of what was done.

BIOS Update:
Updated BIOS using USB flash but did not format USB correctly (NTFS instead of FAT32), resulting in lots of bluescreens
Recognized issue, reformatted USB drive to fat32, repeated BIOS flash, still bluescreens.
Googled BSOD error codes, which attributed the BSODS to out of date drivers, Started updating motherboard drivers.
BSOD while updating motherboard drivers
Several BSODS, Driver update attempts, and installation of windows updates later, windows uninstalls some updates on its own. ("Reverting updates causing boot problems")
Reset BIOS as instructed in prior post, BSODS go away.(So far)
Not sure if i'm using the current BIOS or the old BIOS now, don't really want to mess with it again.

System Drivers:
Update all drivers to the newest version found on the Asus or Intel websites.
Now that the constant bluescreens have stopped, this goes mostly smoothly.
As the z-490f has issues with its onboard NIC, I am using a dedicated NIC card. Still update drivers for the onboard NIC, including flashing the onboard NIC firmware.
Driver download list included Intel Rapid storage technology driver. This hangs at 20% installed for about 45 minutes before I force restart the PC. As best I can tell, this driver does not apply to any hardware I have installed.

Memory
Ram is in the proper slots in the motherboard, the 2nd and 4th, counting out from the CPU.
Have set RAM speed in BIOS to the proper 3600Hz several times, but have also changed/reset/flashed BIOS several times, will double check its currently at correct speed.

Video Card Drivers
Download DDU and most recent NVIDIA drivers
Boot to safe mode
Run DDU several times, removing:
  1. NVIDIA graphics drivers
  2. Intel graphics drivers
  3. All sound drivers
All DDU runs were done in safe mode, with a restart between each.
Networking cable was unplugged for the duration of the DDU runs.
New NVIDIA drivers were installed in safe mode.

Windows Updates:
System was far behind in windows updates.
Updates have been installed, with the exception of KB5011543, which throws error 0x80070002

Currently computer is working normally, so hopefully the above mess actually changed something.

Thanks for the help.
 
Well, hopefully one of those did the trick. If not, I'm suggestive of the fact that not being able to update the BIOS is itself an indication of a problem with the board and I'd be tempted to say that I'm fairly sure that should be a warrantable problem, HOWEVER, with it being ASUS I can tell you for absolutely certain that they do not like to make good on warranty replacement when it's a BIOS issue. I had a faulty BIOS on my Z170 Hero about 8 months into owning it, and then sent it to them and they sent it back, all of which I had to pay for, without doing anything to it saying nothing was wrong that they could cover. So, since then I haven't been too happy with ASUS and I know a bunch of others around here who've had customer service issues with them at the highest levels as well.
 
I'm pretty sure that the BIOS update issue was just me not following the directions properly.
I will probably give it another shot if I keep having issues, but for now I'm just going to hope for the best.
Thanks for letting me know about the ASUS warranty issues. Hopefully I don't have to go that route.
 
Ok, here's the deal. That board has BIOS flashback. THAT is what you SHOULD be using to update the BIOS but be sure to look at it, watch at least one of the many guides out there regarding using BIOS flashback on ASUS boards, and then when you are sure you know what you are supposed to do, do it that way. It's pretty much failsafe. That's kind of the point of it. Doesn't even require a CPU installed to use it.