[SOLVED] Is anyone familiar with this behavior?

Page 5 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

Phaaze88

Titan
Ambassador
I'm not sure whether it's hardware or software related, so I thought to post here...

A)Playing a game, browser is running in the background:
-Character/avatar will stop moving, but you can still see them 'breathing', and audio is looping, the rest of the game is unresponsive though.
-I can alt-tab or Windows key out to the browser, but clicking on anything will 'load forever'.
-If I mouse over my icons in the taskbar, they get stuck like that and the clock will freeze.
-Ctrl+Alt+Delete, no response.
-Backspace+F4, no response.
-I am forced to power off/restart.

B)Running Asus Realbench 8hr stress test overnight. If I wake up before the test is finished:
-Everything 'appears' to be running, but the first oddity I notice is that my fans aren't running as high.
-Realbench is reporting that the cpu is idling, kinda bouncing between 0-7%. The main window's timer is still running, but that smaller window that pops up along with it has frozen - a few hours in, according to it's timer.
The Luxmark render window has frozen as well.

-Hwinfo's still 'running' too: the cpu and gpu are idling, the clock's still running, all the other numbers and stuff are still running... I can even close the app without the system 'freezing' - granted, it's already frozen past this point.
-When the test does finish, it freezes.
-Ultimately, it's frozen and I'm forced to power off/restart.

C)Browse the web, watch Youtube videos all day long:
-Not a problem at all.


Because the system doesn't blue or black screen crash during these soft freezes, I can't seem to get any recent dump files for this. Running the Whocrashed Home Edition pulls up dumps for 2 errors I troubleshooted like 2 weeks ago.
Event Viewer and Reliability Monitor pull up a bunch of the following, but I can't make heads or tails out of them:
-65, AppModel-Runtime
-10005, DistributedCOM
I figure they're from the forced shut offs I've had to do, because the system doesn't crash on its own...

What I've tried so far with no success:
-Update the bios and reset CMOS, running the cpu at stock; no overclock.
-Updated Windows to 20H2.
-CMD Prompt the DISM and sfc scan.
-There's no manual gpu overclock, as I haven't needed to do that since I hybrid cooled it. I do raise the power and temp limit sliders.
-Disabled hardware acceleration in Firefox browser.
-Reconnected all the psu cable connections.
-Reinserted the gpu, and also remounted the Fractal Celsius cooler.
-DDU between drivers 457.30 and 460.89.

I reinstalled Windows back in October-November.

In progress: Gpu power limit set to 50%.
 
Not sure if you checked it, but it sounds the disk is too busy writing. Is the RAM getting close to full by any chance when this happens? Is the disk r/w utilization close to 100%? I had it happen with two VMs being open (12gb RAM for each), running a game triggered the system start writing page files and went sloppy.
 

Phaaze88

Titan
Ambassador
I'd forgotten to select a BA!

Not sure if you checked it, but it sounds the disk is too busy writing. Is the RAM getting close to full by any chance when this happens? Is the disk r/w utilization close to 100%? I had it happen with two VMs being open (12gb RAM for each), running a game triggered the system start writing page files and went sloppy.
While one of my games was running, I took a screenshot of Task Manager from my secondary monitor. That doesn't seem to be the issue at all.
View: https://imgur.com/R6aFzB8
 

Phaaze88

Titan
Ambassador
But then you'll never REALLY know if it was the PSU or the OC, right?
This sorta thing would keep me up at night. :LOL:
I mean, my cpu currently still is overclocked, but I already know how much Vcore I need for 4.6, 4.5, 4.4, 4.3ghz - 4.3ghz no AVX offset, cache clock 2.7 and 3.0...
I guess I stress tested the crap out of the system to find those values. Once I put the D15S back on, I'll likely have to dial it back down again - not a big deal though.

I recently started looking into gpu undervolting after so long. Combined with the DIY hybrid cooling, it looks promising.
Nvidia's Gpu Boost is one big gpu OC cockblock; simply raising the power limit slider, core and memory clocks is not the way to do this.
 
I recently started looking into gpu undervolting after so long. Combined with the DIY hybrid cooling, it looks promising.
Nvidia's Gpu Boost is one big gpu OC cockblock; simply raising the power limit slider, core and memory clocks is not the way to do this.
I had a pretty interesting experience with my Vega 64 and undervolting. The long and the short was that yes, undervolting did allow me to obtain higher clocks for longer periods of time on the stock cooler. But, once I put on the aftermarket cooler, the best performance came from giving the card just under the high voltage mark that it normally boosts to. I found the sweet spot to be 1.17v and the normal auto boost was 1.2v.
 
Yeah, kinda apples to oranges when comparing OC on NVIDIA and AMD.
My Titan Xp behaves similar to what you describe but it doesn't drop off that much after 5, 20, 60 mins. Less than 1fps difference, I'd say. If your 1080 Ti is really dropping off after getting warmed up that, again, points to a card starting to fail.

Actually, since my Titan Xp is so similar to your 1080 Ti, here's what I'm currently using for day-to-day. Maybe a good starting point for you...?
View: https://imgur.com/a/sOzBOGj
 

Phaaze88

Titan
Ambassador
If your 1080 Ti is really dropping off after getting warmed up that, again, points to a card starting to fail.
It's not really warming up at all - gpu core stays under 40C - it's just what I've noticed from games and benchmarks.
Ideally, you don't want Gpu Boost dialing back for anything; the less frequently those limits are hit and the cooler the gpu runs, the better the overall scores and performance end up becoming.
It's all minor stuff, but interesting nonetheless.

1080Ti FE, Gaming OC, and Titan Xp all have the same max power limit of 300w... if I had a bit more power budget to work with, I likely could push it a bit more. Oh well.

Actually, since my Titan Xp is so similar to your 1080 Ti, here's what I'm currently using for day-to-day. Maybe a good starting point for you...?
View: https://imgur.com/a/sOzBOGj
Nay, that's an even higher core clock than what I had applied before(+75). I observed the card running into the power limit and throttle back more frequently - compared to stock - so for a time, I left it at stock... it can be improved, so I'm working on that.
 
GPU failure isn't always 'thermal' failure. Your card could failing due to GPU core trace issues, electron migration at poor solder points, the list goes on. There's a whole host of issues that can cause failure that aren't directly thermal issues.

There are so many posts, even here on Tom's, where people are convinced that their card isn't failing because thermals are low, or at least in check. Thermals are only one part of the GPU failure/OCing picture.
 
I was referring to your original issue that prompted this whole thread. If you don't OC up to where it was before you'll never know if it really was the PSU or GPU that was the issue (failing).

Anywho. At least you have it stable now - that's the most important part.