Fluctuating GPU Utilization (Not Throttling) Experts Help!

Bahazbz

Distinguished
Apr 19, 2015
111
0
18,710
I have a problem, folks. Recently, my R7-370 started having some strange symptoms. I was running the Bioshock Infinite built-in benchmark and noticed some wild FPS fluctuation at random points. Same thing occurred when I ran the benchmark again, random drops from 107FPS to around 30FPS(Not during scene changes). At the same time as these FPS drops, Afterburner was showing corresponding drops in GPU utilization from 100% pegged to 64% then back up to 100%. This had me curious, so after checking to make sure I had windows PCIe power saving settings set to off, I ran the Unigine Valley benchmark and messed around with my GPU settings for a few hours. The same exact issue was present. It would run great for a few seconds at 100%, then drop to about half the FPS and show only a 64% utilization on the GPU for about 5-10 seconds, then back up to 100%. It was always exactly 64% each time. These drops correspond to dips in the core clock frequency down to 865Mhz, then back up to stock. I have never seen this type of behavior in a benchmark before!

Specs:
Windows 10 Professional
8GB DDR3 GSkill RAM
AMD FX-8320 @ 4.4Ghz
Gigabyte Windforce R7 370 OC 2GB GDDR5
Seasonic Bronze 520W power supply
MSI Gaming 990FX AM3+ Mobo


I have done a bit of research online and I found a few people having similar problems, but most of them had simple solutions. Whatever I have, appears to be more complicated.

And before you say it, I have already ruled out the most common causes. Here is what it is NOT:

1)This is not thermal throttling - I keep my GPU below 70c at all times well below the point of thermal throttling. I have an aggressive fan curve set up in afterburner that kicks in if the GPU temps hit anywhere above 62c.
*While I am certain this is not the issue, manually setting the GPU fans to 100% did decrease the frequency of the occurrence.

2) This is not a low workload for the GPU - It is Unigine Valley; The GPU benchmark, for god's sake! If this benchmark(With V-sync off) is not forcing my card to 100% load I don't know what could.

3)This is not a CPU bottleneck. I have kept a close eye on all "eight" cores of my FX-8320. None of the CPU cores are ever reaching above 50% utilization at any time or when my GPU has these drops.

4)I don't have V-sync on. Iv'e been PC building and gaming for years now. I know that my utilization will drop when I have told my GPU to wait for vertical refresh. Please don't insult me by asking me to check if V-sync is on.

5)This is not a bad Overclock... I hope. I had an overclock set in afterburner originally.I set everything back to defaults in afterburner when I noticed this fluctuation. The problem is still frequent at stock speeds( 925 MHz core, 1400Mhz Memory clock, +0% power limit, ect.) Everything is at stock, but my GPU appears to be throttling back on random occasions as if I had an unstable OC.

Has anyone else have similar symptoms before? Any ideas what this could mean?

I read something about too much power being drawn from one of the rails on my PSU causing an temporary "brown out" situation for my GPU. Experts: Is there any merit to this? It seems to make sense as the utilization drops seem to be rather random when running the Valley benchmark and for some reason increasing the voltage limit to +8% greatly reduced the frequency of these drops in utilization. I do postulate that I might be approaching my 520W power supplies limits, but I am not that well versed when it comes to power supplies.

I already tried dropping my overclock in afterburner back to stock speeds, and the problem still persists. Could I have somehow damaged the GPU by having an OC where I didn't even touch the voltage?

At this point I am starting to think it is either the power supply or the GPU itself that is the problem, which would be a bummer as I know there is a widespread GPU shortage everywhere right now and I am not looking forward to replacing a PSU and rewiring my entire case either. Luckily, was savvy enough to get a warranty on both those parts just in case of issues like this but RMA would still be a last resort.

I'm at my wits end with this. Any and all answers are much appreciated!
-Bahazbz
 
Solution
Here's a good explanation regarding the certification:
http://www.tomshardware.com/answers/id-2158823/question.html
It's not that the power draw is less, it's that certain percentage of power is wasted as heat, and the higher the certification, the less this is the case. So if you need to draw 100W on an 80+ certified psu, then the power draw will actually be more than 100W because percentage will be wasted as heat. How much more actually depends on the total power draw, and psus typically have a curve for this.
Total power draw will differ depending on what you're doing. So if you're playing witcher 3 on really good settings, obviously that will be very different from just browsing the internet.
The other point is if voltages on...
Does this happen with all games?
Do you notice screen tearing when playing? Are the drops in utilization happening at specific points in game (moving from wall view to window outside or through the door, looking at specific objects in game etc)?
Is your monitor using freesync? What's the monitor refresh rate?
 
I think it has something to do with a radeon setting. do you have radeon chill on? power saver mode? FRTC? enhanced sync? freesync? check your settings and maybe do a clean driver install to the latest driver. there are how to guides for that.

you are correct with you deductions however. an 8250 can be a bottleneck, but not in those benchmarks. ive had my 7870 give the same score in unigine superposition with a amd Athlon 5350 as with my ryzen 7 1800x.

does this stutter effect your ability to play games? do you notice it? if not it shouldn't be a big deal imo. 370x is not a power house of a card, simply a refined and overclocked 7870
 



When I do play games, I prefer to play at high refresh rates, so I do notice the spikes and dips. If I was fine with 30fps performance and random lag spikes, it would be fine, but that is not what I paid for and that's not how I roll.

I will check my Radeon setting settings soon. That's something I didn't consider initally. Thanks!

Update:
So I checked the settings you mentioned. Radeon chill was off, Frame Rate target control was off, enhanced sync(which I didn't even know existed until just now) was off. I can't find the setting for power saver mode in Radeon Settings tho. It may be in a later diver release. As of now I'm on 17.1.1 so maybe it is time to update and do a clean install. I will try that if no-one else has any more solutions. Thanks for the advice anyway! :)
 


Happens in all my games and all my benchmarks. No freesync support on my monitor and I do notice a little screen tearing but not more than usual. My monitor is only 75Hz and the drops happen in random places at random times, regardless of whats going on onscreen. Even in the synthetic benchmark happens at random intervals and not at the same points during each loop. It's also a relatively new issue that wasn't present a few month ago. This does't seem to be tied directly to GPU load.

 
I don't think you're approaching load limits at that wattage on paper, however, depending on wiring in your place, length of time you've had that psu, it might be approaching end of its lifetime. If you're worried that it might be throttling the gpu, then you could test out the voltage with a voltmeter (do not use hwinfo and such cause it won't report proper numbers to you for this), and see if they're still within tolerances:
https://www.wikihow.com/Check-a-Power-Supply

My suspicion beyond this would be either the overclock or something within the video settings but as it's been a while since I've had ati, I can't really guide you through it.

As for screen tearing, as you probably know if you have a 75Hz monitor then you can only see at 75 fps max. It'll display it as more in benchmark, and it'll try to show you more on screen but because it's a hardware limitation, you'll end up with tearing as it physically can fit only 75 in. Assuming you know all this since you're a builder, I would try capping your fps at different levels around 75 and see if you still have the issues with the tears but without the side effects of lower limit on drops in fps spikes that so often pisses people off. Since you haven't had these issues before this might not be the solution for you but you might want to play around with it all the same just to check. This is something where freesync or a higher refresh monitor would help a lot. If you know someone with one, it might be worth your while testing out if situation improves with a different monitor.
Again doesn't help with your immediate issue of this problem appearing suddenly but it's something to think about down the road.

If you have an overclock on, do the issues disappear once you revert to the default clock?
 


The PSU I bought new in May 2015 and the computer has been pretty lightly used since. Brownouts and spikes are pretty uncommon in my area, so I doubt its dying of old age. That is another thing to look into tho. I didn't think about that.

I too doubt I'm approaching the power supply's 520w limit. I have been thinking about buying a watt-meter just to see how much power power supply is pulling from the wall (in total), under load. If he power supply is 80+ certified, am I correct in my assumption that if the power supply is drawing less than 520w *0.8 then I should be within the limits of it's operation as far as TDP is concerned? I know things get much more complicated when it comes to efficiency of the supply itself and power draw on each rail, but would taking some basic measurements of draw from the wall help me out in ruling out the PSU as the culprit? I'm thinking if I see sub 400w drawn from the wall I am probably not exceeding the total wattage the PSU can supply.

I will also look into getting a voltmeter to test the individual pin outputs from the power supply, but I would need to have one of my electrician friends teach me how to do this. If it does turn out to be the PSU, it is still under warranty, so if it does turn out to be a malfunction I will still be covered.

Btw i'm not concerned about screen tearing at all, that's not an issue for me. I know it's bound to happen when not using V-sync. I'm used to it by now :)

And yeah, I have plenty of monitors lying around. I will re-test with a few other displays with different refresh rates to see if they help. However, I don't have one with freesync available to me unfortunately. I am pretty sure you suggested this as a solution to tearing, but it couldn't hurt to try it and see what happens. At this rate I'm willing to try anything.

As for the overclock - No, the problems are still present at factory stock speeds (1015 core, 1400Mhz mem) as well as AMD stock speeds(925 core, 1400Mhz). I even tried under-clocking my card down to 715Mhz core, and the problem became less frequent, but still was dropping utilization every 20 seconds or so. Really strange behavior indeed.


Edit- I should also mention, increasing the power limit in afterburner seemed to decrease the frequency of the fluctuations, which is what made me think it was a power issue in the first place.

Thanks for your response! I will try what you suggested.
 
Here's a good explanation regarding the certification:
http://www.tomshardware.com/answers/id-2158823/question.html
It's not that the power draw is less, it's that certain percentage of power is wasted as heat, and the higher the certification, the less this is the case. So if you need to draw 100W on an 80+ certified psu, then the power draw will actually be more than 100W because percentage will be wasted as heat. How much more actually depends on the total power draw, and psus typically have a curve for this.
Total power draw will differ depending on what you're doing. So if you're playing witcher 3 on really good settings, obviously that will be very different from just browsing the internet.
The other point is if voltages on individual rails tend to exceed 5% variation limits (usually because of heat related effects) that can have very detrimental effects on your mobo, gpu and cpu. After all these are very sensitive electronics and fluctuations like that can cause issues, as well as over time kill the components. So check the voltages. More on that topic here:
https://en.wikipedia.org/wiki/Power_supply_unit_(computer)
https://www.pcmech.com/article/why-your-power-supply-choice-is-so-important/

So yes, if voltage increase or voltage range increase helps, psu is the thing to check. If psu is ok, then I'd contact amd directly as it may be a hardware issue though if you sound really unhappy and show them the numbers, they'll likely just throw the rma at you without trying to figure out the details. Asking on their forums might be also a good idea as other users may have run into this and may have a better idea what's going on.
 
Solution


Right, I learned much of that PSU information from Jhonny Guru back when I was building this very rig. I had it backwards I think. If my TPD for every component in my system is 400w (not including PSU itself), I can expect to see at least 500w drawn form the wall. That makes more sense.

Thanks Sedivy, this was good info! :) I'll have to pull out my PSU and test all the rails individually, checking for fluctuations.

One more question for the PSU expert; would approaching the wattage limit cause these fluctuations you were talking about, or is that something that only happens when the PSU is actually dying or defective?

Edit- I order a Kill a Watt energy monitor, just to see what my draw from the wall would be. I will report back with what I find.
 
If you're near limit, sudden spikes in power usage may push it over and in essence choke your performance cause cause voltage will suddenly drop and you're not getting the power you need. If it's bad enough and cpu/gpu don't downshift, then you may get errors and such. Voltage and power are directly proportional (P=V*I)
The fluctuations is really due to heat related changes in the metal of the components, particularly capacitors and such wear out over time and the more heat the psu tends to give off, the faster this process happens. The closer you are to its total wattage limit the more heat it'll produce. This is also why quality of psu is just as if not more important than total wattage. Crappy no name brands won't approach in tests anywhere near total wattage reported on paper, they'll heat up a lot, and their capacitors will blow within a year or two.
It's also why I tend to go for some overhead in total wattage in my psu though this is debated as some say keep it at 40-60% load, some cite 50-80%, above 80% and so on. Usually it's not good to get psu that just barely covers your power usage on paper is the point, but there's no need to go crazy either.
I don't know about individual meters. I'd probably get a multimeter to get current and voltage and then calculate wattage from that but whatever is easiest for you. Just don't spend a lot of cash on all this. I thought you might borrow someone's tools for a sec or something :) Otherwise I'd have advised you to just go into repair shop for them to measure this for you.
 


Good to know. I re-ran some of my load calculations using a few online PSU calculators. A lot has changed since I build this system in 2015, and my the calculators show it. I know these are not entirely accurate, but on average thy are showing a load of 450w, which would mean at least a 562w draw from the wall. If this is the case, that would explain my issue. 😀

And don't worry, I was planning on buying one of these kill-a-watt monitors for a while now. It is just a simple socket that allows for the monitoring of the wattage passing through it. I was going to use it to check the power draw differences between my system overclocked vs not anyway and I already have a multi-meter I use every once in a while, so it's no skin off my back. The watt-meter arrives Sunday. I will test that first and if it is not the TDP causing the issue I'll start testing individual rails. I'll get back to you when the results are in. Thanks again, my friend.

P.S. Now this is what I call "Best Answer" material! :bounce:
 
I got the kill-a-watt mete I'm last night and tested it. According to the meter I'm only pulling around 250w from the wall during the synthetic valley benchmark. Of course, the fluctuation issue was still present. :/ I can run some more tests with the watt meter with more CPU intensive tasks, but I think this rules out overloading the PSU as the cause of the issue.

All that's left is to test my PSU's individual rails for abnormal voltages. If that proves fruitless, I'm back to square zero... I hope that's not the case. I'll have to pull out the multi-meter and test the PSU soon. I'll post again what I find.

Thanks again Sevidy for your continued support!
 


Thanks again, Sedivy. I invited my electrician friend over this afternoon to check every rail with a digital multi meter. Here is what we found:

24 Pin Connector:
Pin 1: 3.42
Pin 2: 3.42
Pin 4: 5.15
Pin 6: 5.14
Pin 10: 12.29
Pin 11:12.29
Pin 12: 3.42
Pin 13: 3.41
Pin 23 5.15
Pin 22: 5.15
Pin 21: 5.15

6 Pin GPU connector:
All checked out, same as above

8 pin MOBO Connector:
All checked out, same as above

*The 6-pin and 8-pin connectors both had the exact same voltages read from the 24-pin connector(All were proper when compared with diagrams found online). We measured each voltage for 25 seconds straight, ensuring there was no wobble in the voltages. They were all very steady, as expected from a quality PSU like the S12-II. All these tests were ran with the PSU removed from the case and a paperclip as a jumper.

From what I can tell all of these are within the 5% and 10% margins of error described in the docs you posted. Does this confirm I have good power going to my system? Is it possible my voltages are different when under load? If not, I'm back to square one!

Let me know what you think. Thanks.



 
Voltages will be slightly different under load, but no, your measurements were good. You want 3-5% and no more. You are actually well within 5% and closer to 3% and these are good results, I would say your psu is a-ok.
There are two things I can think of to try. First, get furmark and get a definitive answer on where your benchmarks are when you push your gpu, independent of game related video driver issues. If no throttling of clocks, then this is something game/driver related and it's a matter of finding either the right driver version or the right video setting.
If it does throttle, then try booting a live usb ubuntu and run furmark from it. This should rule out windows specific driver issues. If still shifting to lower clocks when not supposed to, then the only thing I can think of is to contact gigabyte about this. Sometimes there are firmware updates for gpus that help resolve issues with the cards. If not, then rma.
 


THIS, is exactly my problem: https://www.youtube.com/watch?v=6N22v_iIcc0
Except mine also occurs in all other benchmarks. Can't believe it took me so long to find this video.

Anyway, Furmark presented no issues fluctuating utilization readings(100% the whole way), however GPU clocks were showing 925- 965MGhz fluctuating. Strange. Furmark is an OpenGL benchmark yes? At this point I have no idea what any of these results mean, but I am happy to find others having the same issue. I hope this rules out my drivers as the problem. I'll keep looking into the issue. The power limit fix I already had seems to be almost working.ALMOST.

I can try contacting GIGABYTE directly, if you don't have any other ideas.

Thanks again for all your help!

 
Lol and right back to the start of the troubleshooting :) Gah I though that was the power saver mode vogner16 was talking about in radeon settings. I don't have an amd card so can't really advise you on the settings at all but sounds like the fluctuation clocks was due to power push like the one in the video. 925 is the stock clock right? And it doesn't drop below it now when gaming? Temps are ok?
 


Nope, issue still fully present, just less frequent with +8 power limit. Temps are just fine. What do you mean power push? Do you mean increasing the voltage limit? That was my first idea in the original post.
 
Well unless I'm mistaken, he's raising the power limit of the card in the video, to +5 or whatever it was. So with higher limit, you'll have higher clocks occasionally but now I thought it'd have no issues sitting at stock clock. At this point I'd definitely contact gigabyte (or even amd), and explain this about the power limit and ask them what to do to make the issue go away. Clearly others have run into this. If you need to push your power limit more, how much is safe? Can you get the same effect another way? I don't know enough about this to advise you and I think you'd be better of with manufacturer's recommendation just so you don't push your card past its limit.
 


Sorry for the delay. So far the only two things that helped were raising the power limit in AMD overdrive, and raising the power limit in MSI afterburner. I will contact Gigabyte and AMD soon to see if either of them can help me. I'll let you know what they say and then I'll select a best answer. Thanks a bunch for all your help! Cheers! :)
 


I just got off the phone with AMD customer support. (Thanks, Ionis!)
After explaining my situation, here is what they said:

Possible Causes - Thermals, Power, Hardware Failure.
Solutions- Try another computer to isolate the issue, RMA otherwise

He specifically stated that this being a driver issue was extremely unlikely. As he put it, a driver issue would cause artifacting or blue screens, not core clock throttling. Though, he still recommended I try running DDU and install 17.12 or 11.1 driver versions as they are the most stable. He recommended WHQL versions.

He recommended trying the card in another computer (I plan to soon, I'm away from home right now.) to isolate the problem and ensure the GPU is the real cause. He did say, however, that increasing the power limit was an indicator that it may be some power related issue. Either in my PSU or my card's power handling mechanisms. He mentioned that my methods for checking the power supply are correct.

If I isolate the card and the problem is still present, it means faulty graphics hardware. AMD recommends RMA from gigabyte in that case.

He also mentioned that faulty VRAM would be causing similar issues, but that would also be generating blue-screens if that was the problem.

It looks like you were right on about everything! That's impressive.:ouch: Either way, the call was not super helpful, but I am glad to have our suspicions confirmed.:)

After I test in another system, I will decide if I even want to bother with Gigabyte's customer service(I heard it's terrible). I will get back to you when I do.
 
Oh thanks for a feedback, that was a lot more detail from amd support than I thought you'd get. And yes, checking your card in another system will pretty much rule out other components if it keeps happening. Regarding RMA, you can always forward them what amd said and what you've tried so far. Considering it's been a pretty thorough troubleshooting, you shouldn't have too much trouble with the claim.
 


I just got done testing in a completely different rig and a completely different PSU and the latest AMD drivers. (I tried all the seeting you segested all over again)... Same symptoms.

I think it's time to call it on the old R7 370. Time of death, 11:44pm 2/20/2018.

I'll have to see if I can RMA with gigabyte. It should be easy seeing as I have ran all the necessary tests and isolated the issue. Hopefully they will have a replacement in stock, but who knows with the GPU shortage these days. Plus, it's not really dead yet , just...malfunctioning a bit. 😉

My eternal thanks go out to everyone who responded to this thread. Particularly you Sedivy, who is likely the only one reading this anyway. Thanks for being patient with me and my problems. I actually learned a lot of valuable knowledge about PSU's in particular. Thank you!

For anyone reading this after the fact, here is some advice:

Literally the problem description: https://www.youtube.com/watch?v=6N22v_iIcc0 Also try the power limit adjustment in the video.

Step 1: Check for all the things I mentioned as possible causes in my first post: thermal throttling, ensure you have v-sync/ freesync / anything-sync ect...

Step 2: Check Radeon Settings for everything Vogner16 mentioned.

Step 3: Isolate the problem by testing the card in another rig, if you have access to such a thing. If not, try another PCI-e slot and see if that helps.

Step 4: Check your power supplies total draw using a watt-meter. If it's higher than or near the limits of your power supply, consider upgrading it.

Step 5: Check your PSU's individual rails for fluctuating voltages or other irregular signs using a multi-meter. If your PSU is not from a trusted brand, check review of your particular model on JonnyGuru. If it has low v-droop and not a lot of wiggle then it's probably not the issue.

Step 6: If you are sue it's the card and not another factor, go for RMA from whichever board partner you bought from. Apparently this is a common issue with the R7 370, especially the factory OC ones.

If you got this far and the problem is still not fixed, you might be SOL, captain. Sorry, and good luck in your journey.



Thanks again to everyone that gave advice, and special thanks to Sedivy, the PSU expert, especially! Cheers!

-Bahazbz