Question Extremaly long POST times, crashes and BSODs with XMP enabled [Z590, 11600k]

Aug 15, 2022
3
0
10
Hi everyone! I'm facing quite a bizarre issue with an Intel 11th gen platform I've build some time ago for my dad. The build initially consisted of:
  • Intel Core i5 11600k CPU (no OC so far - all settings are left stock)
  • Gigabyte Z590 UD AC Motherboard
  • Crucial 16GB (2x8GB) 3000MHz CL15 Ballistix Sport LT Gray (BLS2K8G4D30AESBK ) Memory Kit (a reliable leftover from my own AMD Ryzen build - let me further refer to it as a RAM kit #1)
  • Samsung 980 1TB SSD
  • Thermalright Macho Rev.C Plus CPU Cooler
  • XPG Core Reactor 750W 80 Plus Gold PSU
  • MSI Geforce 1660 Super Gaming X GPU
First off: Few months ago, I set up a brand new platform from those components, it went fine but there was a single issue with the build - really long POST times, taking about 20-25 seconds from pressing the power button to just displaying the MOBO's splash screen and further proceed with the Windows boot. Basically every time the PC gets powered off completely and turned on again, it takes really long time to start up, yet subsequent restarts are not that bad. It wasn't that much of a deal for me, though it was really puzzling that a brand new PC from 2022 takes so much longer to POST than even my antique C2D build from 2008. I've performed the BIOS update to the F5 version which sadly didn't fix the issue. As everything else worked fine, I simply gave it to my dad. Few months passed and he complained to me a few times about this issue, so I proceeded to investigate further. The motherboard has some LED lights showcasing which component are actually checked during the startup and most of the process seems to be spent on memory. That surprised me a bit, as Intel usually was the brand that offered a better memory compatibility, but I assumed that maybe it doesn't like the kit #1; I went ahead and bought another one from the Gigabyte's compatibility list, which was:
  • CRUCIAL BallistiX 16GB (2x8GB) DDR4 3200MHz (BL2K8G32C16U4B) - I'll refer to it as RAM kit #2
I went ahead and replaced the old sticks with this new kit, however it did not improve situation at all - POST times was still as long. Determined to fine the cause, I've removed the dGPU and tried single sticks from each kit in each memory slot, with XMP profiles on and off . That also made no difference sadly. I've reached out to the Gigabyte support about this issue but they were not too helpful (suggested removing all of the optional components and test what causes the issue which I've already did). Ultimately I've decided to leave the faster memory kit (#2) in PC and call it a day - after all it was just an inconvenience on startup and everything else was fine.

Now here's where things get interesting. After some time my dad complained again - now that the game he plays mostly (World of Tanks) started to crash randomly. I've initially believed it's a game related issue introduced by some patch, but after checking the Cyberpunk 2077, I confirmed it also crashed randomly on me. I've reinstalled the graphic drivers using the DDU, and verified again - but the issue was still present. I performed some tests using the OCCT (for CPU) and Furmark (for GPU) separately and everything seemed to work just fine, including the temps which were certainly fine. It was only after I've decided to run both benchmarks simultaneously that within few first minutes OCCT reported multiple errors on CPU's physical cores and begin to crash. From now I knew how to reliably trigger the issue - just test under full system load. Running the Intel's Processor Diagnostic Tool alongside the Furmark even caused the BSOD with service exception dsgkrnl.sys (which I believe points to the GPU driver, but it's definitely not a source of this whole issue) though later OCCT combined with Furmark proved to be able to trigger BSODs as well.

I was pretty mad about this because I assumed that the CPU was somehow damaged (I know it's hard to do it but suspected that maybe the MOBO applied some stupid voltage at auto preset and degraded it within this few months). Surprisingly after messing some more with the PC, it turned out that disabling the XMP fixes the crashing issue completely. However, the weirdest thing is that turning the XMP back on, doesn't trigger the issue immediately. The PC will work perfectly fine for a while, only to start crashing after some more time. Moreover, so far I was unable to trigger this issue using RAM kit #1 with XMP enabled. It's just rock solid when testing and gaming, and the only difference I could see in sensor reading was slightly higher bus frequency the motherboards adjusts for kit #1. I've checked the kit #2 using memtest and it seems to be completely issue-free on it's own. I've even further updated the BIOS to the latest F7b version. Still, no luck.

At this point I'm quite at loss at what to do with all of this and I'm humbly asking for your advice. Currently the PC works fine with kit #1 (aside from the long POST issue) but I feel like I should RMA something, just not sure whether it should be a CPU or MOBO. I know the XMP is not 100% guaranteed to run on each system, but I own a few builds and it's the first time ever for me, that a reliable platform like Z590 with popular and recognized kit (which this Ballistix sticks sure are) are having such a ridiculous stability problem with stock CPU settings. Am I missing something obvious here? Or maybe I should suspect some other component also? For the reference, here are the screenshots of tests runs with both #1 and #2 kits with XMP enabled, as well as as many sensors reading I could fit on screen:


Tl;dr version:
  • PC POSTs for quite a long time (I specifically mean POST, Windows startup is pretty quick), LED indicates long memory checks, regardless of memory kit, number of sticks, used slot, XMP profile
  • With memory kit #2, the applications or even whole system can crash, but only under full system load (both CPU and GPU needs to be under load)
  • Disabling the XMP fixes the issue, though enabling it does not cause it to come back immediately (but it will eventually come back)
  • With memory kit #1 and XMP enabled, there's no crashing
  • Memory kit #2 seems to be just fine when checking for errors
  • The GPU drivers and BIOS are already up to date
I'll appreciate any suggestions. I'd love to simply RMA some component and get over with it but I'm not sure which one and I'm afraid that having specifically problems with single memory kit and only under full load will not be easy issue to replicate.
 
The first kit, Kit #1, is NOT on the Crucial compatibility list for that motherboard, so it is a poor choice AND if it was a Ryzen compatible kit, isn't optimal for Intel based boards anyhow. It's not like the old days. Yes, some kits MIGHT work on both types of platforms, but for the most part these days you want a memory kit that is specifically validated as compatible for a given motherboard on either the motherboard's QVL list or the memory manufacturer's list. That kit is not, so having problems with it isn't unusual.

The second kit isn't either, although a very similar kit ending in BL instead of just B, is, and I want to assume that the only difference between the two is that the L stands for lighting compared to just the B for Black, which is what that designation is. This kit I think SHOULD work so I'd probably stick to using it rather than the other kit which is not indicated at all for this board.

First, EXACTLY which slots, starting at the CPU, 1, 2, 3, 4, with 4 being the slot closest to the edge of the motherboard, do you have the memory installed in?

Second, I'd check very closely to see that the CPU cooler is not overly tightened, and that it is equally tightened as closely as possible all the way around. CPU coolers that are too tight or are cocked in the socket even a tiny bit because of one side or one corner being tighter than the rest, can and does, often, cause memory problems. I'd also be inclined to take the cooler back off and remove the CPU to 100% check the motherboard and make sure there are absolutely no bent pins on the board or damage to any of the contact pads on the CPU. And that there is no debris of any kind in between them. It happens. Even experienced builders sometimes screw up when installing a CPU or minute damage happens to one of the pins in the socket bed from a non installation occurrence and they don't notice it.

Third, if you didn't do it after updating the BIOS and especially if you didn't do it after adding different hardware to a motherboard that had something different running before, I'd do a hard reset of the BIOS.

BIOS Hard Reset procedure

Power off the unit, switch the PSU off and unplug the PSU cord from either the wall or the power supply.

Remove the motherboard CMOS battery for about three to five minutes. In some cases it may be necessary to remove the graphics card to access the CMOS battery.

During that five minutes while the CMOS battery is out of the motherboard, press the power button on the case, continuously, for 15-30 seconds, in order to deplete any residual charge that might be present in the CMOS circuit. After the five minutes is up, reinstall the CMOS battery making sure to insert it with the correct side up just as it came out.

If you had to remove the graphics card you can now reinstall it, but remember to reconnect your power cables if there were any attached to it as well as your display cable.

Now, plug the power supply cable back in, switch the PSU back on and power up the system. It should display the POST screen and the options to enter CMOS/BIOS setup. Enter the bios setup program and reconfigure the boot settings for either the Windows boot manager or for legacy systems, the drive your OS is installed on if necessary.

Save settings and exit. If the system will POST and boot then you can move forward from there including going back into the bios and configuring any other custom settings you may need to configure such as Memory XMP, A-XMP or D.O.C.P profile settings, custom fan profile settings or other specific settings you may have previously had configured that were wiped out by resetting the CMOS.

In some cases it may be necessary when you go into the BIOS after a reset, to load the Optimal default or Default values and then save settings, to actually get the hardware tables to reset in the boot manager.

It is probably also worth mentioning that for anything that might require an attempt to DO a hard reset in the first place, IF the problem is related to a lack of video signal, it is a GOOD IDEA to try a different type of display as many systems will not work properly for some reason with displayport configurations. It is worth trying HDMI if you are having no display or lack of visual ability to enter the BIOS, or no signal messages.

Trying a different monitor as well, if possible, is also a good idea if there is a lack of display. It happens.


Furthermore, these LONG POST problems are VERY OFTEN the result of having multiple drives installed, one or more of which might have old hidden EFI/Boot partitions from previous Windows installations that you didn't know were still there because you deleted or formatted the existing C: partition but had no clue about the fact that there are other partitions as well. Sometimes in fact if you install Windows without removing a drive that previously had Windows installed on it, it will neglect to create a new EFI/Boot partition and if you remove that older drive at any point it will completely fail to boot because the primary drive with the Windows installation that is current on it, can't boot due to there no longer being a boot partition in the system.

If you have more than one drive, I'd remove the secondary drive and see what happens.

If these are old drives, I'd thoroughly test them using Hard disk sentinel. Also might be a good idea to download Seatools for Windows and run the Short drive self test (DST) and long generic, to test for problems.

If you did not do a clean install of Windows when you assembled this, that is probably a good idea as well.
 
  • Like
Reactions: eastzoner
Aug 15, 2022
3
0
10
@Darkbreeze Thanks for the answer! You're right, there's a little difference in model number with this second kit. I'd love to be able to just pick up memory kits from the QVL list but in my region it's usually hard to obtain the correct kits, with the ones listed being either not available or disappearing over time. This was the closest ones I could get, aside from some lower-clocked ones. Anyway, to answer your question - I have my memories in slots 2 & 4 respectively. It's actually a great idea for me to check if a CPU is not tightened too hard, the Macho cooler is pretty beefy and I've tried my best not to overdo it but maybe I did. Is there like a good visual indicator that could help to determine it was tightened too hard (aside from the mechanical damage)? I'll check if I have any thermal paste left on me and inspect both the CPU and socket. I don't recall performing the hard reset after the update so I'll try this also. About the drives, I only have the M.2 Samsung SSD inserted at the moment. It was brand new and a did a clean install of Windows 10 on it (no cloning from previous setup or anything). I didn't try removing it because of that but if your other suggestions fails, I'll try to boot some live Linux distro from USB without the SSD present to see if it helps. I'll report back as soon as I try those solutions out.
 
I mean, as far as the CPU cooler goes, the rule of thumb I like to follow is basically, for both the backplate to bracket and bracket to cooler bolts, just lightly seated to the point where to tighten any further you'd have to actually apply a little elbow grease so to speak. You don't want anything loose, but you don't need to crank down on anything either. Once things feel like you'd have to actually give it a bit of muscle to turn further, that's when I generally stop. Some people like to do that and then give it an extra eighth to quarter turn, but that is often where they make a mistake.

You have the memory in the correct slots, so that isn't the issue.
 
  • Like
Reactions: eastzoner
Aug 15, 2022
3
0
10
Update: I'm cautiously optimistic to say that issues with PC got solved after loosening up the cooler and performing BIOS Hard Reset, though I believe only the latter activity actually improved things. POST time decreased from 20-25 seconds to 10-11 seconds. I'm running the memory kit #2 (BL2K8G32C16U4B) with XMP enabled once again and so far my dad did not experience any crashes whatsoever, though I'm still not 100% sure this is fixed since it was kind of random. Anyway, so far so good. Thank you very much for the suggestions @Darkbreeze !