Question GPU randomly dropping to 0% utilization

BaaaaL44 · Apr 30, 2024

I've been experiencing a strange issue for a few weeks. Every now and then when playing games (So far it has happened in AC:Origins, Lords of the Fallen, Dragon's Dogma 2) GPU utilization suddenly drops to 0%, along with GPU power, core voltage, and basically all other GPU metrics, and the game either freezes or stutters unplayably. Eventually (sometimes a few seconds, sometimes 30 seconds or so) it resumes normal operation. Here is a picture from OCCT that shows the problem:

https://imgur.com/a/KdzvHoY

View: https://imgur.com/a/KdzvHoY

CPU and PSU metrics seem unaffected, and the GPU does not produce any artifacts, and remains perfectly stable in 3DMARK (99.7% frame consistency in a prolonged stress test) and other benchmarks/stability tests. I am using the latest AMD drivers, but the issue started happening with the previous one, that I had been using for a while when it first happened, so it's unlikely to be a driver issue. Nothing in my configuration has changed recently, except for getting a new screen (Alienware 2724DM). My GPU is fairly heavily overclocked. My specs:

Gigabyte Z790 UD AX
Intel i5 13600KF
AMD RX6800
Corsair Vengeance DDR5 6000, 32GB
Samsung 980 PRO 2TB
Asus ROG STRIX 850W Gold PSU
Be Quiet Dark Rock 4 cooler.

My GPU is fairly heavily overclocked (it is running at 2500Mhz) but thermals are within acceptable range (highest hotspot has ever gotten was around 82-83 degrees celsius), CPU never reaches 70. Does anyone have a clue why this may be happening or how to troubleshoot it? I am comfortable doing software stuff, but since my PC is still under warranty, and was custom-built in a shop, I cannot really dig around inside it.

Thanks in advance!

Roland Of Gilead · Apr 30, 2024

BaaaaL44 said:
I've been experiencing a strange issue for a few weeks. Every now and then when playing games (So far it has happened in AC:Origins, Lords of the Fallen, Dragon's Dogma 2) GPU utilization suddenly drops to 0%, along with GPU power, core voltage, and basically all other GPU metrics, and the game either freezes or stutters unplayably. Eventually (sometimes a few seconds, sometimes 30 seconds or so) it resumes normal operation. Here is a picture from OCCT that shows the problem:

https://imgur.com/a/KdzvHoY

View: https://imgur.com/a/KdzvHoY

CPU and PSU metrics seem unaffected, and the GPU does not produce any artifacts, and remains perfectly stable in 3DMARK (99.7% frame consistency in a prolonged stress test) and other benchmarks/stability tests. I am using the latest AMD drivers, but the issue started happening with the previous one, that I had been using for a while when it first happened, so it's unlikely to be a driver issue. Nothing in my configuration has changed recently, except for getting a new screen (Alienware 2724DM). My GPU is fairly heavily overclocked. My specs:

Gigabyte Z790 UD AX
Intel i5 13600KF
AMD RX6800
Corsair Vengeance DDR5 6000, 32GB
Samsung 980 PRO 2TB
Asus ROG STRIX 850W Gold PSU
Be Quiet Dark Rock 4 cooler.

My GPU is fairly heavily overclocked (it is running at 2500Mhz) but thermals are within acceptable range (highest hotspot has ever gotten was around 82-83 degrees celsius), CPU never reaches 70. Does anyone have a clue why this may be happening or how to troubleshoot it? I am comfortable doing software stuff, but since my PC is still under warranty, and was custom-built in a shop, I cannot really dig around inside it.

Thanks in advance!

Hey there,

So, what happens when you run everything stock. No OC's? Same issues?

BaaaaL44 · Apr 30, 2024

Roland Of Gilead said:
Hey there,

So, what happens when you run everything stock. No OC's? Same issues?

Haven't tried that yet, that would be my next step at troubleshooting, but I'm a but skeptical, because I've been using these exact OC settings for half a year now without any issues. The issue also presents itself pretty randomly, but I'll probably play around for a while to see if it also happens at stock.

UPDATE: Tried it on stock, produced a very similar usage drop within a few minutes in LOTF, but it did not drop to 0%, only to like 50, and from 80fps to 40-ish.

UPDATE 2: Same issue after DDUing and reinstalling the driver and trying stock settings.

BaaaaL44 · May 1, 2024

Roland Of Gilead said:
Hey there,

So, what happens when you run everything stock. No OC's? Same issues?

Tried with and without OC, reinstalling drivers from scratch, exact same issue. It's sporadic, and sometimes I have to play for hours to experience it, but it is definitely there, and it hasn't been before. Forgot to mention but there is also zero artifacting or other indication of a GPU fault. I'd assume if it was CPU-related, CPU utilization would also drop along with the GPU because it wouldn't be giving the GPU instructions. What else can it be? PSU? OCCT shows consistent voltages across all rails.

Roland Of Gilead · May 1, 2024

I don't see it being the PSU, but if it's not functioning correctly, then it could be it's not providing enough clean power. Do you have another you can swap out and try?

BaaaaL44 · May 1, 2024

Roland Of Gilead said:
I don't see it being the PSU, but if it's not functioning correctly, then it could be it's not providing enough clean power. Do you have another you can swap out and try?

Unfortunately no, and I wouldn't really be comfortable messing around in an SI-built rig under warranty. For current troubleshooting I decided to run an OCCT gold stability certificate for all components to see if something is off. GPU test has been running for 3 hours with zero errors so far. For some reason synthetic tests don't seem to produce the problem. Any idea why that might be?

Roland Of Gilead · May 1, 2024

BaaaaL44 said:
For some reason synthetic tests don't seem to produce the problem. Any idea why that might be?

Yes, that's typical. Synth testing is very rigid, and the load doesn't really fluctuate, although depending on what app, you can make the load variable. Anyway, gaming will put a load on both the CPU and GPU, but the GPU switches load type and changes frequencies very quickly. This is known as power spikes. It could be the spikes are causing the issue.

Typically PSU issues would be shutting down midgame, and restarting with no faults or messages.

BaaaaL44 · May 1, 2024

Roland Of Gilead said:
Yes, that's typical. Synth testing is very rigid, and the load doesn't really fluctuate, although depending on what app, you can make the load variable. Anyway, gaming will put a load on both the CPU and GPU, but the GPU switches load type and changes frequencies very quickly. This is known as power spikes. It could be the spikes are causing the issue.

Typically PSU issues would be shutting down midgame, and restarting with no faults or messages.

I think OCCT has a variable graphic test that is supposed to test transients specifically by switching between low and high power states extremely quickly, but it's probably not the same as actual games. The weird thing is, sometimes I can game for hours without seeing the issue a single time at my core clock OCd by 200Mhz, then it locks up for a few seconds, then continues on normally.

Do you have any other software-based suggestions apart from the OCCT certificates? Or just use it for the time being and see if it either gets fixed by a driver update or become regular/systematic enough to be clearly identifiable what causes it?

35below0 · May 1, 2024

You can try to disable XMP and run the RAM at JEDEC. This isn't going to directly solve anything but it may have an effect on the motherboard and 13600KF, and how the CPU is being overclocked.

If GPU utilization changes, then it eliminates the GPU as the culprit and you can maybe move on to slapping some sense into Gigabyte's PerfDrive.

I'm not sure this is the problem but you can never tell. Disabling XMP will give it away if it is the prblem.

Roland Of Gilead · May 1, 2024

Hmmm. At this stage, I think it's worth updating the bios to rule that out. What's the current bios?

The latest is F10F, here : https://www.gigabyte.com/Motherboard/Z790-UD-AX/support#support-childModelsMenu

Make sure to clear CMOS after the update (if your bios is not updated)

BaaaaL44 · May 1, 2024

Roland Of Gilead said:
Hmmm. At this stage, I think it's worth updating the bios to rule that out. What's the current bios?

The latest is F10F, here : https://www.gigabyte.com/Motherboard/Z790-UD-AX/support#support-childModelsMenu

Make sure to clear CMOS after the update (if your bios is not updated)

I'm using the factory BIOS, but since it worked for more than half a year without this issue, wouldn't that rule out BIOS as a potential culprit?

Roland Of Gilead · May 1, 2024

Aida64 Extreme is another good one I use. It has a number of different benches, and a System Stability Test too, which you can choose components in the testing.

Roland Of Gilead · May 1, 2024

BaaaaL44 said:
I'm using the factory BIOS, but since it worked for more than half a year without this issue, wouldn't that rule out BIOS as a potential culprit?

No. If there were any drivers or apps installed with the older bios, it could cause many bugs.

I would strongly suggest updating it, not only for potential bug fixes, you also get enhanced system/CPU performance and vital security fixes. Time to update!

35below0 · May 1, 2024

BaaaaL44 said:
I'm using the factory BIOS, but since it worked for more than half a year without this issue, wouldn't that rule out BIOS as a potential culprit?

Factory BIOS, as in it's never been updated?

Problem with that is the motherboard sat in a box on a shelf for a long time before you bought it. It would be wise to update it one time at least, just because the BIOS very out of date.

Normally, updating BIOS when nothing is broken is a bad idea. Having a factory BIOS is also a bad idea though.

BaaaaL44 · May 1, 2024

35below0 said:
You can try to disable XMP and run the RAM at JEDEC. This isn't going to directly solve anything but it may have an effect on the motherboard and 13600KF, and how the CPU is being overclocked.

If GPU utilization changes, then it eliminates the GPU as the culprit and you can maybe move on to slapping some sense into Gigabyte's PerfDrive.

I'm not sure this is the problem but you can never tell. Disabling XMP will give it away if it is the prblem.

Would XMP start to act up after working for half a year, if the CPU is at stock clocks and its utilization does not change when GPU clocks drop?

35below0 said:
Factory BIOS, as in it's never been updated?

Problem with that is the motherboard sat in a box on a shelf for a long time before you bought it. It would be wise to update it one time at least, just because the BIOS very out of date.

Normally, updating BIOS when nothing is broken is a bad idea. Having a factory BIOS is also a bad idea though.

Sorry, just checked, the BIOS was, in fact, updated when assembled. It's a June 2023 bios, and technically what I have is revision 5, now we are at revision 9 officially, with a beta version for 10 available. So it's not technically the factory bios.

35below0 · May 1, 2024

No, XMP enabled affects CPU overclock weirdness that motherboards can do. If you switch XMP off and your 0% utilization drops no longer happen, then it can be assumed the problem is being caused by the motherboard and not the GPU.

Would XMP act up? No. It cannot. Maybe some update caused problems to appear.
It's recently become known that motherboard manufacturers by default overclock CPUs, and push them outside intel recommended settings. This can invisibly make systems unstable. All it takes is one OS update or driver update to expose this instability. It may be related to your GPU usage drops.

Or not, but it's one less variable to worry about.

As for the BIOS, i think you're fine with F5. F10 or 11 will probably be something to consider updating to, because Gigabyte is working on new, stable defaults to deal with the PerfDrive instability i mentioned.

You may try F9 to see if it cures your issue, but i'd look at the BIOS update history to see if changes are related to GPU usage or not. No point flashing BIOS if it doesn't help.

BaaaaL44 · May 1, 2024

35below0 said:
No, XMP enabled affects CPU overclock weirdness that motherboards can do. If you switch XMP off and your 0% utilization drops no longer happen, then it can be assumed the problem is being caused by the motherboard and not the GPU.

Would XMP act up? No. It cannot. Maybe some update caused problems to appear.
It's recently become known that motherboard manufacturers by default overclock CPUs, and push them outside intel recommended settings. This can invisibly make systems unstable. All it takes is one OS update or driver update to expose this instability. It may be related to your GPU usage drops.

Or not, but it's one less variable to worry about.

As for the BIOS, i think you're fine with F5. F10 or 11 will probably be something to consider updating to, because Gigabyte is working on new, stable defaults to deal with the PerfDrive instability i mentioned.

You may try F9 to see if it cures your issue, but i'd look at the BIOS update history to see if changes are related to GPU usage or not. No point flashing BIOS if it doesn't help.

Thanks! Currently running a combined CPU and RAM test in OCCT to see if anything comes up but I kind of doubt it because I had been using the config for almost a year without problems. Although I guess it's possible that an overzealous mobo OC profile degraded the CPU marginally so now it's unstable with factory default clocks.

Noth666 · May 1, 2024

Stumbled across this and saw a lot of suggestions but not clearly a yes/no to the most fundamental question there is in a case like this. With all OC off, does the issue appear at all or not? If not, the troubleshoot should focus on what aspect of OC causes the issue. If yes, it begins to matter much if there is a way to test components individually.
I saw suggestions of updating bios, or testing various bios versions. I would advise against this, in particular trying multiple or some such. Bios updates can brick the motherboard and/or other things, so they should never be a first go-to or done willy nilly as a random stab at changing something. There are many things that can be done relatively safely to figure out what is going on, updating bios is NOT one of them.

BaaaaL44 · May 1, 2024

Noth666 said:
Stumbled across this and saw a lot of suggestions but not clearly a yes/no to the most fundamental question there is in a case like this. With all OC off, does the issue appear at all or not? If not, the troubleshoot should focus on what aspect of OC causes the issue. If yes, it begins to matter much if there is a way to test components individually.
I saw suggestions of updating bios, or testing various bios versions. I would advise against this, in particular trying multiple or some such. Bios updates can brick the motherboard and/or other things, so they should never be a first go-to or done willy nilly as a random stab at changing something. There are many things that can be done relatively safely to figure out what is going on, updating bios is NOT one of them.

I think I updated my original query at some point. The CPU is not OCd at all, the GPU is, pretty heavily (2500Mhz, with +10% TDP and the maximum VRAM frequency Adrenaline allows) but the issue also appears when I set everything back to factory defaults, so I don't think it's the OC tbh.

BaaaaL44 · May 2, 2024

Sorry for bumping: CPU and RAM are also stable according to OCCT (linpack, AVX2). Is there any chance this is something introduced by a windows update? IIRC there was a large update a few weeks ago. Should I try uninstalling it for testing purposes?

35below0 · May 2, 2024

BaaaaL44 said:
Sorry for bumping: CPU and RAM are also stable according to OCCT (linpack, AVX2). Is there any chance this is something introduced by a windows update? IIRC there was a large update a few weeks ago. Should I try uninstalling it for testing purposes?

It probably isn't related but you may as well give it a shot.

BaaaaL44 · May 2, 2024

35below0 said:
It probably isn't related but you may as well give it a shot.

Thanks! In the past with my previous rig I've had windows updates cause BSODs so who knows.....assuming the hardware is sound (OCCT seems to think so) what other software options are there? DirectX? Some background process hogging it down?

35below0 · May 2, 2024

Nope. Drawing blanks.

Re-read your OP and a couple of things came to mind.

How come your CPU never gets close to 70C with the Dark Rock 4? I have the 13600K, cooled by a Noctua NH-D15 with both the standard and optional 140mm fans working. It runs up to 76-78C. Possibly even higher, i haven't stress tested too much.
Idk. I'd expect the Dark Rock 4 to perform worse. Could be case cooling. Hmm. idk

You said the PC is custom build and under warranty, and that you cannot root around it too much. So take it back to the shop and ask them to thoroughly check it. You have this team of techies at your disposal so why not use the opportunity?

It's nothing obvious i can think of and it seems nobody else at Tom's has a quick answer.

BaaaaL44 · May 2, 2024

35below0 said:
Nope. Drawing blanks.

Re-read your OP and a couple of things came to mind.

How come your CPU never gets close to 70C with the Dark Rock 4? I have the 13600K, cooled by a Noctua NH-D15 with both the standard and optional 140mm fans working. It runs up to 76-78C. Possibly even higher, i haven't stress tested too much.
Idk. I'd expect the Dark Rock 4 to perform worse. Could be case cooling. Hmm. idk

You said the PC is custom build and under warranty, and that you cannot root around it too much. So take it back to the shop and ask them to thoroughly check it. You have this team of techies at your disposal so why not use the opportunity?

It's nothing obvious i can think of and it seems nobody else at Tom's has a quick answer.

Okay so I'll try to answer everything:

Dark Rock 4 performs remarkably well in normal gaming workloads in a suitably ventillated case. I have a Corsair 4000d Airflow and I very rarely see my CPU reach 70C. No saying it has never happened (BF2042 comes to mind) but it is very uncommon. Of course it hits 90+ in stress tests but that's not my primary use case. The CPU is performing to spec in Cinebench.

The reason I'm hesitant to send it back to the shop is because their repair shop would basically be running the same tests, 3DMark, OCCT, what have you, or replace the GPU and be done with it. There is no way in hell any system integrator in Hungary will be willing to spend weeks replacing a component, playing random games sometimes for hours, hoping the glitch rears its head, then replace another component, etc. They simply don't have time or the willingness to hunt down the cause of such an elusive issue when they have 20+ configs to build a day, just as many actually, identifiably bad (artifacting, crashing) components to replace and webshop orders to dispatch.

Since the rig is still under warranty for more than a year (most components came with 2 years) I'd prefer to troubleshoot myself first (or at least gather evidence/data on the issue to help the repair guys if it comes to that) before I commit to 2-3 weeks without my PC for a compoment change that might not even solve it.

Hope that makes sense!

35below0 · May 2, 2024

Makes a lot of sense, thanks for the reply.

I guess i was naive to hope a repair shop would actually investigate the issue. Esp. an elusive one.
If you took it to them they would probably replace the 6800 and be done with it.

Hmm... Where is the Samsung 980 PRO 2TB? The CPU side M.2 slot?
That would rob the 6800 of some of it's full capability as the PCIe 5.0 x16 slot downgrades to x8, but it should not cause problems.
Maybe the 980 Pro is doing something that causes the PCIe slot to choke.

You cannot really move the NVMe around, but you can check if there is something weird going on when GPU util. drops to 0%. Are there any crazy reads or writes? Is it overheating?

Question GPU randomly dropping to 0% utilization

Prominent

Titan

Prominent

Prominent

Titan

Prominent

Titan

Prominent

Respectable

Titan

Prominent

Titan

Titan

Respectable

Prominent

Respectable

Prominent

Prominent

Prominent

Respectable

Prominent

Respectable

Prominent

Respectable

Share this page