Boot Loop & Stability Issues w. OC following delid

dueprocessofflaw

Honorable
Jan 18, 2015
41
0
10,530
Long story short:
I'm getting caught in a boot loop when making manual changes to voltages and clocks in BIOS, unless I reset to default settings first.

If I make a manual adjustment, after a different manual setting has already been saved, Windows won't start, and it just loops to BIOS over and over.

I can get it to boot if I set BIOS settings to default, and then key in manual voltage / clock info, but any additional changes start the loop over, meaning any time I want to make a change to clocks or voltages, I need to reset to default settings, reboot, and then redo all my manual adjustments, including disabling C States and EIST, changing SATA settings, enabling XMP, etc.

Details below...

Build:
CPU: i7 4790k
Mobo: Asus Maximus VII Impact
RAM: G.Skill TridentX 2400MHZ @ CL 10
GPU: Nvidia Titan X Pascal
PSU: Corsair AX860i
Storage: 512GB Samsung 960 Pro (OS); 1TB Samsung 850 Evo x2 (Storage); 6TB HGST Deskstar NAS x2, Raid 1 (Backup).

I've been running my CPU OC'd for years... history:
2/2015 - 6/2015: Stock (Corsair H100i);
7/2015 - 2/2016: 4.6GHZ Core @ 1.22v, 4.4GHZ Cache @ 1.20v (Corsair H100i);
2/2016 - 2/2017: 4.7GHZ Core @ 1.25v, 4.4Ghz Cache @ 1.2v (Corsair H100i);
2/2017 - 12/2017: 4.5GHZ Core @ 1.20v, 4.4GHZ Cache @ 1.2v (Noctua D15S).

I've been temp limited. I don't push my chip past 80 deg. Celsius under successive IBT runs (all maxed at 78 - the move from 4.7 to 4.5 circa 2/2017 was driven by a desire for peace of mind and the chang to air cooling over the aio I had previously). All prior OCs were stable -- never had a crash once I had them dialed in as above.

I delidded my CPU last night, using the rockit 88 kit (TG Conductonaut between die and IHS), and it went pretty smoothly considering it was my first try.

PC boots up fine... but my BIOS reset to default settings. I lost a RAID 0 array (the Samsung 850s) -- that's what I get for using a mobo-based RAID utility (my backup HDDs are setup using Storage Spaces, so that was fine), although I can restore via backup; it's a pain, but not a problem, and I've pooled the 850s using Storage Spaces to avoid a similar problem in the future.

But, there seems to be... "consistency" issues w. my BIOS now.

I can boot to Windows fine under default settings and w. the XMP profile, but Asus feeds the chip an absurd amount of voltage -- default 1.28v at stock clocks...

I can boot at 4.5GHZ @ 1.225v, so long as I leave cache stock, and I did some IBT runs; max temps are much improved, approx. 10-15 deg. cooler than I was getting on 1.20v prior, so that's pretty good, and it should leave me room to up my clocks a good bit.

But, here's where it gets tricky...
For starters, I can't get Windows to boot under my prior OC settings. If I set my core voltage to 1.20v (still running at 4.5GHZ) -- which it had been running at for the last 10 months, Windows won't boot -- I get stuck in a BIOS loop. In addition, attempting to put my cache OC at 4.4GHZ @ 1.20v leads to the same result.

And, here's where it gets really weird:

Following the boot loop on attempting to revert to 4.5GHZ core at 1.20v, I tried going back to 1.225v, which had booted fine and just made it through 30 runs of IBT... same boot loop. The only way to get out of the loop is to revert BIOS settings to stock, and then I can manually key in the 4.5GHZ and 1.225v, but only after I reset to default settings and reboot.

It almost seems software / BIOS related; the key here seems to be resetting to default settings prior to entering manual info, as evidenced by the fact that I can only get the 4.5GHZ / 1.225v to work if I reset to default settings first -- not if I'm changing from different manual info (and I can't even get it to boot w. Auto clocks / voltages when I'm coming from manual settings, either).

Has anyone else run into problems like this before?
I haven't pulled the CMOS battery yet, because I'll have to rip out the D15S to get to it.
 
Solution
Thanks for the update. Glad things seems to be running normally again, with much improved temps. Nice build BTW :) I'm rusty on RAID, but see what you're saying. I still have an old ARECA 1210 RAID card from years back in the early days of consumer motherboard onboard RAID. It was a much better solution. Only used onboard once or twice.

Good luck with the cache OC, and with those temps and voltage, good chance to go a bit further.
Strange indeed. I'm surprised the array data was deleted. Stripes all disappeared? No way to rebuild the stipe without format? I haven't messed with RAID for awhile though. Possible the TIM used for the delid caused a "hot spot". Had someone last week with an issue that turned out to be the liquid slid off the core a bit, but their symptom was temps reported. Outside of something delid related, nothing else changed.
 
Update: got home from work, booted up, plugged in manual voltage... it's taken all my adjustments today without issue. We're back to 4.5ghz @ 1.2v core (I scaled down in increments of 0.005 from 1.225, w. IBT runs at each interval). So, I haven't tried messing w. my cache yet, but that comes tomorrow, and I'll update accordingly.
Might just have needed a moment to sit... who knows... never had an issue like it, and hoping it's all set.

Yeah, array failed. Now, whether the array failed because of an initialization failure during one of the many boot loops or whether it failed off the bat when my motherboard went to default settings (and set SATA to AHCI, disabling RAID), and was causing the boot loops itself, I'm not sure.

The actual drives (850 Evos) are fine. I striped them in Storage Spaces last night. The sustained read / write isn't as quick, but 4k random r/w isn't much different, so... probably worth it for the compatibility.

I think the delid went cleanly -- doesn't appear to be a hotspot. All cores are within 5deg celsius idle, normal operation, and max temp, and temps are actually pretty good across the board. Dropped 10-15 deg celsius on air. I'll be pushing that clock up, I think... if everything keeps working smoothly.
 
Thanks for the update. Glad things seems to be running normally again, with much improved temps. Nice build BTW :) I'm rusty on RAID, but see what you're saying. I still have an old ARECA 1210 RAID card from years back in the early days of consumer motherboard onboard RAID. It was a much better solution. Only used onboard once or twice.

Good luck with the cache OC, and with those temps and voltage, good chance to go a bit further.
 
Solution
Cache is back at 4.4 @ 1.2v.

Any idea if I can increase the precision of the supplied voltage?
I'm trying to tighten it up...
Right now, manual voltage is set to 1.271, but the Mobo is supplying 1.28. Mobo appears to be supplying large / imprecise voltage increments. For instance, any voltage requested (via manual voltage setting) in the following ranges results in the actual vcore below:
Manual Voltage: 1.25-1.2625... Mobo gives 1.264, w. LLC bump to 1.28;
Manual Voltage 1.265-1.275... Mobo gives 1.28 w. LLC bump to 1.296.

It'd be nice if I could lock in somewhere between 1.264 and 1.28, but regardless of what I specify in BIOS, Mobo seems to give me increments of .016, rounded up. You know of any way I could tighten those increments? It'd be nice if I could feed it ~1.27x on the dot, w. LLC climb to ~1.285 or so. It was stable through ~50 runs of IBT at 1.264 w. the LLC bump to 1.28 last night, but crashed during simultaneous benchmarks this morning (testing GPU as well). I want to feed it a little more than 1.264 / 1.28, but jumping to 1.296 (w. LLC) seems a little extreme.
 
The reading in the UEFI isn't really all that accurate in realtime. You would need a good DMM to measure the voltage
points to be accurate. That said, you really won't be able to tighten things in further that I know of. It sounds lik you are right on the edge of stabiity with voltage. Going over 1.3 isn't really a big deal either. I ran a solid 4.5ghz on my 5820k Haswell-E using 1.325v without speedstep or C-states. If it were me, I'd add a bit more voltage as a buffer. Also, there are spikes that arent shown in software that occur when a load is applied and finished. This is normal behavior, and LLC adjusts this high/low reading.
 
The voltages above are reported via HWMonitor (VIN4). Core voltage per CPU-Z is 1.271 (which is what HWMonitor shows under VID). Am I good to go w. the CPU-Z figure?

It was my understanding that HWMonitor's VID shows the req. voltage, whereas VIN4 showed what you're actually receiving / running.

Either way, 1.3 was the ceiling I had in the back of my mind, so spikes to 1.296 to correct for vdroop don't worry me too badly, was just hoping I could have a little more control over it.