CPU Overclock Suddenly Unstable: Prime 95 Stable - but Games Crash to Desktop!

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

Ransome

Distinguished
Jul 24, 2012
1,163
2
19,315
Hello.
My main specs:
OS: Windows 10 Pro 64bit
GPU: Asus 1080 Ti Strix OC Edition
CPU: Intel Core i5-3570K
MB: Asus Sabertooth Z77
RAM: 16GB G.Skill Ripjaw-X DDR3 1600MHz (XMP profile)
PSU: Corsair RM850i


I used a CPU overclock of:
XMP Profile
4.4GHz x44 multiplier
Voltage: Offset -0.005 or +0.005
LLC: Extreme

One day, after months of running smoothly and with good temps and gaming stability...
BOOM. Games start to crash to desktop consistently, frequently and fairly quickly.
Thought it's my GPU OC, so I tried setting to default and even closing MSI AB but it didn't help.

Tried upping the Voltage to +0.005. Worked...after a few days. BOOM not stable again.
OK so I tried +0.010. Same thing. Fixed...then after a few days it is suddenly not stable and inexplicably crashes.

Tried different voltages like +0.020 with LLC HIGH 50% instead of Ultra High. Worked for a week then one day stopped. Tried going back to OC that were stable before and it didn't work.

So I tried upping the voltage more but it only generates excessive voltage without real stability.

I also tried 4.3GHZ at -0.030 0.040 yesterday, with LLC HIGH - and it worked well - only to stop working this morning.

The way I test it is using Devil May Cry 4 Special Edition Benchmark.
It will almost always crash in the final stage's final seconds of the benchmark. Seemingly for no reason because the clocks and temps are flawless. (50s-60s C CPU temps).
Also tried to run The Witcher 3, stand in Novigrad crowded area and wait - it will CTD after a while.
Other benchmarks work without a crash:
Supervision, Valley and 3D Mark FireStrike & Spy Demo.
Games like Dishonored 2 might crash unexpectedly.

Everything used to be so smooth.
The PC is super clean - I cleaned it so freaking thoroughly a while ago and yesterday again.
Tried clean installing multiple Nvidia Drivers.
Tried Windows Memory Diagnostic tool
I don't understand what's wrong and it drives me crazy.
Been like that on and off for weeks. I can't properly Game when my games will CTD so often and so quickly.
 
Solution
It passed. Not errors, failed tests or stability issues reported by RealBench, and Event Viewer seems clean without WHEA.
That's good. You could add an extra 0.010v and call it a day. Should last long enough.
The PC kinda crawled when trying to open other apps like EventViewer - but I guess it's normal during tortures.
That's true. Takes a long time to do open anything because of CPU getting maxed.
What would have happened if I tried 4.4, 4.5 or 4.7 or even 4.8 like some people go for?
The quality of every chip is different. Your's might run 4.4GHz at 1.300v, another guy's 3570K might run 4.6GHz at 1.300v. So you have to test it yourself and be prepared to spend some time testing.
And did you see my last comment...
Hey guys - regarding Windows possible corruption. I just ran these commands:
DISM.exe /Online /Cleanup-image /Restorehealth
and then:
sfc /scannow

To verify system files and check for corruptions, and I got this report:

C:\WINDOWS\system32>DISM.exe /Online /Cleanup-image /Restorehealth
Deployment Image Servicing and Management tool
Version: 10.0.17134.1
Image Version: 10.0.17134.228
[==========================100.0%==========================] The restore operation completed successfully.
The operation completed successfully.
C:\WINDOWS\system32>sfc /scannow
Beginning system scan. This process will take some time.
Beginning verification phase of system scan.
Verification 100% complete.
Windows Resource Protection found corrupt files but was unable to fix some of them.
For online repairs, details are included in the CBS log file located at
windir\Logs\CBS\CBS.log. For example C:\Windows\Logs\CBS\CBS.log. For offline
repairs, details are included in the log file provided by the /OFFLOGFILE flag.

Is this normal? How do I fixed this? Can't find proper answer online.

 


Already tried method 1 and 2 (which are both the same basically) as I have wrote in my last post and pasted the commands- and it didn't work, twice.
It failed both after DCIM, and in Safe Mode after DCIM.
Method 3 seems overly complicated and it seems easier to Clean Install Windows. But I could be wrong.
 
Darn. Ideas?
Running Sfc /scannow again in Safe Mode & Network as we speak (cmd admin). If it won't work I'll try DCIM again, reboot to Safe&Network and Sfc again.

This is tiring. lol. Why would I have these issues. I didn't mess with anything.
 
I'm not an expert at windows. I can do general troubleshooting but nothing too serious.
I would leave it alone till windows breaks, then install it again. You don't need to do this, it's just what I would do. I have a folder with all the softwares and config I'm going to use. So I don't mind reinstalling windows.
Why would I have these issues. I didn't mess with anything.
Bad update, virus, malware, Bad OC, etc. Too many reasons. Last time I reinstalled was because Microsoft's major update at that time updated my windows to oblivion.
 


Which test/s exactly should I run Prime95 with? (Blend, Small FFT, Large). Should I tweak anything?
Can you please link me the best version of Prime 95 from a safe source?
Also what do I look for? Just crashes and BSODs or what?

Regarding RealBench, never heard of it, how exactly do I use it, and where should I download it from?
8 and 4 hours sounds so unhealthy at high temperatures and load, but I see your point.
(of course I can google it, but there are many unsafe sources and I rather just ask someone who have more experience than me with them).
Thank you.


Do you still think I should Clean Install? It's a big step.
 
Can I please get the simple version,
Which test is best to run for temps and stability tests on Prime 95? Small FFT, Large or Blend?
So I don't waste 4 hours on the wrong torture test every time...
How important it is to look at event viewer errors, and what are the worst errors?
Because I assume some errors and failures are inevitable at such tortures.
How about temps?

What's the difference between Prime and RealBench, isn't one enough?
(I'm trying to save time here. And protect an already old possibility degrading CPU- from needless heat and stress).
Anything I should note or select when Running RealBench?
Thanks.

(Edit: about clean install, I just can't fix the stupid SFC errors, but I don't know how important it is. Clean Install is brute force solution but might take less time than all these troubleshooting.)
 
Blend.
So I don't waste 4 hours on the wrong torture test every time...
If you don't want to waste time, run the test for 15 minutes. If it doesn't crash/freeze, up the voltage for extra stability (about 0.005v).
Prime and RealBench
Each uses different ways to stress test. I personally don't like prime95 but it gets the job done.
OCCT is popular too. It runs calculations and reports if an OC is unstable enough.
http://www.ocbase.com/index.php/download
(I'm trying to save time here. And protect an already old possibility degrading CPU- from needless heat and stress).
You could underclock to 4.0GHz and call it a day.
 

Ran Prime 95 Blend for about 2:30 hours, got back home. Didn't crash at all, but after 1:30 hours one of the worker stopped on one core. I'm guessing it's not good?
Also got some WHEA Warnings in Event Viewer. Are these bad or acceptable?
I then rebooted, upped the Voltage by 0.05 once, then ran Small FFT for 25 minutes - got some WHEA warnings but no crash and workers running. Then +0.05 again but lowered the LLC.
Still getting plenty of WHEA Warnings in a row, but no system crash or workers stopping, for now.
Seems LLC Medium or less will cause tons of WHEA Warnings...
If I set LLC to Auto my OS won't boot or will freeze often or BSOD. Not sure why LLC is such a MUST in my system. I read infinite topics about it online - some people say LLC is a must-have and they can't go without, others avoid it like wild-fire. There's really no solid answer. I just wish Auto (default) LLC would be stable.
Not entirely sure how to proceed.
Didn't have time to switch to RealBench or OCCT yet.

I want to find my stable OC setup quickly then probably go for the clean install process.
 
Level 4 or 5 LLC + keep increasing voltage till you don't get any errors (safe till 1.30v). If you still get errors after voltage boost, it's probably the windows messing with you.

Not sure why LLC is such a MUST in my system.
what goes on when settings are set to AUTO is dictated by the BIOS. Sometimes the BIOS doesn't assign correct values for the OC to work causing people to manually tweak LLC and other stuff to make it work. It's the same for me. That's why it's recommended to update to the latest BIOS before tweaking it.
 

Wait, both what exactly for what length?
Yup, failed, lol happens.
What shortcut? I've done exactly as instructed.
Further, I went down and lowered Vcore to -0.020 (not -0.010). Clearly not enough V.
But it took quite a while for the processor to fail, so maybe I'm getting close.

What about these yellow WHEA Warnings in event viewer?
Are they even important to both with?
Reading about overclocking and stress/stability testing over the years: I don't recall people talking about checking for errors or warnings in Event Viewer. All they seem to care about is whether or not the they crash/BSOD and their temps.
It's only because I figured I'll check Event Viewer after frequent crashes in my TW3 tests, that I discovered those WHEA Warnings.


Convinced? No, I never said that. I was recommended by people like you in past threads to try lowering voltages and going negative offset is generally good if you are stable and want cooler temps at lower clocks.
Nobody ever said undervolting could potentially risk the CPU or OS health, only cause instability.
In fact this is the first time I hear about "OS Corruption" from OC.

Isn't Undervolting extremely relative? - or + offset V for 1 chip will be totally different for another of the same model.
How does one even know if he is Under-Volting or Over-Volting anyway?


Yes, I have the latest version of BIOS, I also re-flashed to the same version yeterday, just to be safe, when trying to determine and solve the issue. Version 2014 which was lastly updated in 2013.
I'll try upping the voltage or LLC some more if needed.
 
What about these yellow WHEA Warnings in event viewer?
I get errors when my OC crashes. Don't know about yours.
How does one even know if he is Under-Volting or Over-Volting anyway?
You OC to let's say X. BIOS sets 1.200v for that OC. Anything below stock is considered undervolting. Anything above is considered overvolting. At least that's how I define it.
Nobody ever said undervolting could potentially risk the CPU or OS health, only cause instability
If the OC is barely stable, it can cause corruption over time without crashing/freezing. Also, any OC is considered overvolting as you go above the stock voltage. Undervolting after that is basically decreasing overvolting.
I'll try upping the voltage or LLC some more if needed.
I'm still surprised your OC lasted this long. Most people adjust their OC clock and voltages regularly as time passes.
What shortcut?
He means 15 min "dirty test" used to find barely stable OC. If the rough OC doesn't pass 15min or barely passes it, we apply a small voltage boost for extra stability and forget about it. Or we keep adjusting.
 

You are right, I used to do smaller tests whenever tweaking my OC, kinda like zebarjadi said above, these "dirty tests". But occasionally I would leave it for longer runs, like an hour or so.
It's just my personal common sense and instincts made me not feel very comfortable about torture testing my CPU and system at maximum: heat, voltages and loads - for long periods.


Alright, I just finished a long 3.5 hours of Prime 95 BLEND Stress test.
After also running 45 minutes of SMALL FFT test.
:
Both came out with zero crashes, and zero errors or even warnings in Event Viewer.
That's with the same Overclock I used while testing games yesterday (TW3 long sessions and DMC4 benchmark):
4.3 GHz, -0.010 Offset, LLC HIGH 50% and XMP.

I consider upping the voltage by extra 0.005 (to -0.005) just have the extra stability - like zebarjadi suggested.
I think about running an extra 1 hour of OCCT or RealBench maybe.
zebarjadi How long should I test OCCT? Anything I should tweak there?
Thanks.
 
4.3 GHz, -0.010 Offset, LLC HIGH 50% and XMP.
You seem to have a thing for negative offset. I don't use it unless I have overheating problems.
It's just my personal common sense and instincts made me not feel very comfortable about torture testing my CPU and system at maximum: heat, voltages and loads - for long periods.
I kept my previous PC on for 2 years straight (except for power outages), still working after 12 years.
Zebarjadi How long should I test OCCT? Anything I should tweak there?
Should have infinity mode. If it detects errors it should show on the graphs (pops up) afterward saying error detected.

The thing is, I don't like testing for hours and hours just for finding stability. I just overvolt slightly and forget about it.
 
If you overvolt, yes you are fine. Trying to get the most performance out of the lowest voltage you need to test or you are going to end up in the same,place. I dont understand you came here for a solution, I gave it to you but you want to do the least possible testing which leaves you back at square 1.
 


bmockeg - in RealBench do I need to do Benchmark or Stress Test?
When I run Stress- and set ram up to 16 - it closes the program saying I don't have enough memory.How to set it up exactly?

I appreciate the help. Feel like I'm nearing a solution. Found a stable OC hopefully.
Think I will also do a Clean Install next to fix other accumulated issues, as you recommended.
 
I dont understand you came here for a solution, I gave it to you but you want to do the least possible testing which leaves you back at square 1.
Wait just read your comment again -what are you referring to? I didn't reply since.
I actually did what you and the others suggested, so I don't understand where you are going at:
I've run Small FFT for around 45 minutes (after 15 passed) after increasing my Voltage to -0.010 Offset -to check basic rough stability.
Then I switched to Blend and did a long run. Almost 4 hours. No crashes or errors.
Then I did OCCT for 2 hours in 2 separate runs. Came out free of errors in all the graphs sheets reports.
Finally I did about an hour of RealBench Benchmark test, did quite a few passes across all tests. No errors and decent temps.

I'm NOT trying to take any shortcuts - or I wouldn't be spending over 2 days just trying to stabilize my computer. But at the same time I can't spend days on end, enslaved to these tests like some Overclockers are.
And I didn't even count all that time troubleshooting system issues and even locating the problem - eliminating as many sources as possible.
And I have further work cut out for me still, if I end up Clean Installing my system.
I just want to find a stable OC before that.

If you overvolt, yes you are fine. Trying to get the most performance out of the lowest voltage you need to test or you are going to end up in the same,place.
I don't understand what you mean here, can you be clearer?

Regarding "Undervolting": You guys keep talking about Negative Offset as a form of undervolting? But that can't be precise.
Because many times you add extra +0.XXX Offset Voltage - only to find you don't have enough Vcore to run higher OCs. So what I'm saying is Negative/Positive Offset - at least how I understand this- has nothing to do with Under/Upper-Volting.
Because the starting point of Offset is rather unclear and not truly 0.

So how do you figure out if you have it Under/Over-Volt? How do you even determine what's the base voltage?
If I leave the voltage at AUTO (default) - it will have EXCESSIVE Voltage (i.e overvolting). For instance, I remember trying 4.4ghz and 4.5ghz at Auto Voltage (stock voltage) and it jumped to 1.4; 1.5;1.6 or higher Vcores ! = BIOS was overvolting dangerously!
In other words Offset +0.005 and Auto are not 0.005 apart.
If someone can clear this up for me, it will really help.