Question Unstable computer shuts down unpredictably

Myronazz

Distinguished
Sep 5, 2016
329
12
18,795
Hi there,

I have an Alderlake machine that likes to freeze and reboot with "RAM overclock failure". Sometimes, it won't POST at all with memory-related beep codes and I have to take out one of the memory sticks to get it booting. CMOS reset does not help.

It doesn't matter which one of the sticks I take out. As long as only one is there.

The computer does this with & without XMP enabled. Even a memory speed below the manufacturer's recommendation causes this.

Funnily, back when the computer was first built in early 2022, it had similar behavior. Random shut downs and freezes until one day, it refused to post at all. I figured it was a motherboard problem so I replaced it and it worked fine for a year... but now its doing this again.

I can't figure this computer out. Did a memory test and it passed. Did a stress test with Prime95 and Furmark and it passed. And its perfectly stable until it decides to crash after a few days of stability. Yes, days. Not hours or minutes.

How possible is it that my new MSI board is dying? That would be some pretty terrible luck. But if the memory is good and the PSU is fine, what could else could it be? The CPU? I've never in my entire life heard of a dead CPU. Much less a misbehaving one.

Any tips on how to proceed?

Specs:
  • GPU: MSI RTX 3050
  • CPU: i5 12400f
  • RAM: 16GB DDR4 @ 3200MHz
  • Motherboard: MSI PRO H610M-B (Previously a Gigabyte H610M-H)
  • PSU: Corsair RM650X
  • SSD: Some WD Blue 240G NVMe drive
CPU-Z memory:

IIDavYA.png


(Sorry about the photograph instead of a screenshot. It's all I have right now)
 
Last edited:
What is your current motherboard BIOS version?

That Klevv memory kit is not on the motherboard QVL list, and Klevv does not seem to have a compatibility list or utility like G.Skill, Corsair and Crucial have, so it's impossible to gauge whether it's actual compatible with that board or not. Just because it's DDR4 and the board takes DDR4 doesn't mean it's compatible. It is always wise to ONLY purchase kits that you can confirm are validated for any given motherboard. Of course that doesn't mean a given kit CAN'T work, but it also doesn't mean it can. Sometimes a kit that has not been validated will just "work" or can be tuned to work by tweaking the frequency, voltage or timings, but even if you are willing to go through that process there are no guarantees.

Klevv is also a wildcard. They have some fairly good quality kits but they are a small player in the retail memory segment compared to these others and it's likely there is not nearly as broad of support for as wide a range of platforms using Klevv memory as there is using more mainstream brands.

I'd say try setting the XMP profile and then, before you save settings and restart, bump the DRAM voltage up to like 1.36v. Then save settings and exit BIOS. If no love, try 1.37. Repeat until you either have success or reach 1.4v because if it doesn't work by then it's probably not going to stabilize via voltage. At that point you can revert back to like 1.35-1.36.

Then, I'd pull the CPU and look for bent pins on the motherboard. If there are no bent pins on the motherboard, reinstall, with fresh paste after cleaning the old paste off, and make sure when you install the CPU cooler that it is completely evenly tightened all the way around because a CPU cooler that is unevenly tightened or where one of the retaining pins has come free, can act exactly like a bent pin on the motherboard or CPU.
 
  • Like
Reactions: Myronazz
That Klevv memory kit is not on the motherboard QVL list, and Klevv does not seem to have a compatibility list or utility like G.Skill, Corsair and Crucial have, so it's impossible to gauge whether it's actual compatible with that board or not. Just because it's DDR4 and the board takes DDR4 doesn't mean it's compatible. It is always wise to ONLY purchase kits that you can confirm are validated for any given motherboard. Of course that doesn't mean a given kit CAN'T work, but it also doesn't mean it can. Sometimes a kit that has not been validated will just "work" or can be tuned to work by tweaking the frequency, voltage or timings, but even if you are willing to go through that process there are no guarantees.
I had no idea such lists existed. My whole life, I just installed random modules I had lying around (yes, I would mix them too) and things always worked out. Not saying that this is a good thing to do, but it is what I did.

Has this stuff always been a thing?

Then, I'd pull the CPU and look for bent pins on the motherboard.
I seriously doubt there are bent pins. But is it possible that the cooler's mounting screws became loose overtime? Because like I said, this was working fine for a year. Its just now that its doing all these problems.

Furthermore, didn't Alder Lake have a bending problem?

I can do it but I don't have thermal paste. I have to get some first.

I'd say try setting the XMP profile and then, before you save settings and restart, bump the DRAM voltage up to like 1.36v. Then save settings and exit BIOS. If no love, try 1.37. Repeat until you either have success or reach 1.4v because if it doesn't work by then it's probably not going to stabilize via voltage. At that point you can revert back to like 1.35-1.36.

This is a good idea, but it can take days for it to actually crash. I wish there was a reliable way to make it crash. Prime 95's memory controller stress test does nothing. Works fine even after hours.

This is so tricky...

In general, I hate it when a computer has such irregular and rare behavior. Makes isolating the problem very hard.
 
There have been motherboard QVL lists since pretty much around the time DDR3 went from low density to high density design, if not longer. I can't remember if they were a thing prior to that timeframe.

Alder lake did have bending problems. Could be relevant. I made sure to implement the mod shown at Igor's lab who I believe got the process from Der8auer when I installed my 12700k. I guess it could be relevant in your case but honestly I haven't heard much of anything on that subject since the initial ruckus.

Might just need to get a better board. IDK, you're right, intermittent problems are a major pain.
 
  • Like
Reactions: Myronazz
There have been motherboard QVL lists since pretty much around the time DDR3 went from low density to high density design, if not longer. I can't remember if they were a thing prior to that timeframe.

Alder lake did have bending problems. Could be relevant. I made sure to implement the mod shown at Igor's lab who I believe got the process from Der8auer when I installed my 12700k. I guess it could be relevant in your case but honestly I haven't heard much of anything on that subject since the initial ruckus.

Might just need to get a better board. IDK, you're right, intermittent problems are a major pain.

I'm so sorry for the late reply. Been busy.

What I ended up doing was to buy new memory modules. It doesn't look like the Klevv ones are faulty so I can just re-use them on my Ryzen PC (The problematic Intel PC is my brother's).

I looked at my motherboard's QVL list and none of the modules were available for sale, except maybe on the eBay used market, which I didn't want to bother with (who knows what they've been through?)

In the end, I bought some Corsair Vengeance LPX modules because apparently, those are hand-picked and of good quality. It's a gamble, I know, but I'm willing to risk it. If it doesn't work, well, I don't know what I'll do.
 
You never answered the question about what your current BIOS version is? VERY OFTEN getting some memory kit to work on a specific motherboard requires only updating the BIOS as manufacturers tend to add increased memory compatibility many times via BIOS updates over the course of the life of a motherboard. And, it's not always something that is noted in the details regarding that release version. Other times it's the primary reason for the update.

Any of the kits listed below are 100% validated for use with your motherboard.

PCPartPicker Part List

Memory: G.Skill Ripjaws V 16 GB (2 x 8 GB) DDR4-3200 CL16 Memory ($38.99 @ Newegg)
Total: $38.99
Prices include shipping, taxes, and discounts when available
Generated by PCPartPicker 2023-06-17 12:24 EDT-0400



PCPartPicker Part List

Memory: G.Skill Ripjaws V 16 GB (2 x 8 GB) DDR4-3200 CL16 Memory ($37.99 @ Newegg)
Total: $37.99
Prices include shipping, taxes, and discounts when available
Generated by PCPartPicker 2023-06-17 12:33 EDT-0400



PCPartPicker Part List

Memory: G.Skill Ripjaws V 16 GB (2 x 8 GB) DDR4-3200 CL16 Memory ($38.99 @ Newegg)
Total: $38.99
Prices include shipping, taxes, and discounts when available
Generated by PCPartPicker 2023-06-17 12:33 EDT-0400



PCPartPicker Part List

Memory: G.Skill Trident Z RGB 16 GB (2 x 8 GB) DDR4-3200 CL16 Memory ($56.99 @ Newegg)
Total: $56.99
Prices include shipping, taxes, and discounts when available
Generated by PCPartPicker 2023-06-17 12:35 EDT-0400



PCPartPicker Part List

Memory: G.Skill Trident Z 16 GB (2 x 8 GB) DDR4-3200 CL16 Memory ($48.99 @ Newegg)
Total: $48.99
Prices include shipping, taxes, and discounts when available
Generated by PCPartPicker 2023-06-17 12:47 EDT-0400



PCPartPicker Part List

Memory: Crucial Pro 32 GB (2 x 16 GB) DDR4-3200 CL22 Memory ($64.98 @ Amazon)
Total: $64.98
Prices include shipping, taxes, and discounts when available
Generated by PCPartPicker 2023-06-17 12:54 EDT-0400
 
  • Like
Reactions: Myronazz
These are all in America. I'm from the UK 😛

My BIOS version is 7D46v12. It's outdated but now that I think about it... is it possible that Windows updates caused some kind of incompatibility? I doubt it but its worth asking. It might explain why it was working for a couple of months and then it started to suddenly freeze.

I put in the new RAM yesterday. No crashes yet. But I didnt define timings, speed, gear mode, and command rate. I just enabled XMP and let it do its thing. I saw that it picked timings close to 15-16, which is the memory is capable of. Bad idea or good idea? Should I just manually set them? I don't think that I trust XMP anymore.

On other news, I put the KLEVV modules on my Ryzen 5 2400g PC. No crashes and its even clocked for 3200MHz. I even tried 3400MHz but it wasn't stable. Screw you Zen 1...

So, as I suspected, the modules weren't faulty, the Alder Lake build is just very sensitive to memory (which is what you pointed out with the QVL list).
 
Last edited:
Just because those links are in the US does not mean those parts are not available worldwide, especially in the UK.

You need to update to the latest BIOS version for your motherboard. There are like five newer BIOS versions that ALL have a primary reason of improved memory compatibility as the main change. You don't need to do each of them, just update to the latest one. Afterwards, do a hard reset of the BIOS and then enable XMP and see what happens. I'm inclined to believe that this might well be all you need to do to fix the problem, because OFTEN it IS. So regardless, you need to update the BIOS before going any further. This is your MOST likely solution. If it does not fix the problem we can move forward from there.
 
  • Like
Reactions: Myronazz
Just because those links are in the US does not mean those parts are not available worldwide, especially in the UK.

Oh, I tried one of them but it returned similar kits but with different heatsink colours (which might mean that its a different model). I'm sure with some looking around its possible but I already purchased new RAM, so I didn't look too deep into it.

The computer crashed again, which tbh I was hoping for. It might the BIOS as you say, so I updated that, and we'll see if it does again.

I didn't define timings as I said. I just enabled XMP and let it do its thing. I guess using JEDEC speeds is the next step if it crashes again.

And if it crashes yet again... Well... I will be really lost.
 
So, if you still have problems, I'd disable XMP and run at the default JEDEC configuration, then run Memtest86 to see if there are problems without XMP enabled. If no errors after the full four passes of all tests, then enable XMP again and run Memtest 86 again to test the XMP configuration.


Memtest86


Go to the Passmark software website and download the USB Memtest86 free version. You can do the optical disk version too if for some reason you cannot use a bootable USB flash drive.


Create bootable media using the downloaded Memtest86. Once you have done that, go into your BIOS and configure the system to boot to the USB drive that contains the Memtest86 USB media or the optical drive if using that option.


You CAN use Memtest86+, as they've recently updated the program after MANY years of no updates, but for the purpose of this guide I recommend using the Passmark version as this is a tried and true utility while I've not had the opportunity to investigate the reliability of the latest 86+ release as compared to Memtest86. Possibly, consider using Memtest86+ as simply a secondary test to Memtest86, much as Windows memory diagnostic utility and Prime95 Blend or custom modes can be used for a second opinion utility.


Create a bootable USB Flash drive:

1. Download the Windows MemTest86 USB image.

2. Right click on the downloaded file and select the "Extract to Here" option. This places the USB image and imaging tool into the current folder.

3. Run the included imageUSB tool, it should already have the image file selected and you just need to choose which connected USB drive to turn into a bootable drive. Note that this will erase all data on the drive.



No memory should ever fail to pass Memtest86 when it is at the default configuration that the system sets it at when you start out or do a clear CMOS by removing the CMOS battery for five minutes.

Best method for testing memory is to first run four passes of Memtest86, all 11 tests, WITH the memory at the default configuration. This should be done BEFORE setting the memory to the XMP profile settings. The paid version has 13 tests but the free version only has tests 1-10 and test 13. So run full passes of all 11 tests. Be sure to download the latest version of Memtest86. Memtest86+ has not been updated in MANY years. It is NO-WISE as good as regular Memtest86 from Passmark software.

If there are ANY errors, at all, then the memory configuration is not stable. Bumping the DRAM voltage up slightly may resolve that OR you may need to make adjustments to the primary timings. There are very few secondary or tertiary timings that should be altered. I can tell you about those if you are trying to tighten your memory timings.

If you cannot pass Memtest86 with the memory at the XMP configuration settings then I would recommend restoring the memory to the default JEDEC SPD of 1333/2133mhz (Depending on your platform and memory type) with everything left on the auto/default configuration and running Memtest86 over again. If it completes the four full passes without error you can try again with the XMP settings but first try bumping the DRAM voltage up once again by whatever small increment the motherboard will allow you to increase it by. If it passes, great, move on to the Prime95 testing.

If it still fails, try once again bumping the voltage if you are still within the maximum allowable voltage for your memory type and test again. If it still fails, you are likely going to need more advanced help with configuring your primary timings and should return the memory to the default configuration until you can sort it out.

If the memory will not pass Memtest86 for four passes when it IS at the stock default non-XMP configuration, even after a minor bump in voltage, then there is likely something physically wrong with one or more of the memory modules and I'd recommend running Memtest on each individual module, separately, to determine which module is causing the issue. If you find a single module that is faulty you should contact the seller or the memory manufacturer and have them replace the memory as a SET. Memory comes matched for a reason as I made clear earlier and if you let them replace only one module rather than the entire set you are back to using unmatched memory which is an open door for problems with incompatible memory.

Be aware that you SHOULD run Memtest86 to test the memory at the default, non-XMP, non-custom profile settings BEFORE ever making any changes to the memory configuration so that you will know if the problem is a setting or is a physical problem with the memory.
 
Hi, sorry for the late reply. I was busy but the computer is still crashing even after latest BIOS and upgrading the modules to the Corsair ones.

I did everything you said and Memtest always passes without issue. Without XMP at motherboard defaults, with XMP, and then JEDEC. I don't think its the memory to be honest. And its really annoying how it crashes occasionally.

I'm at a loss at this point. If it's not the memory, then what is it? Because if it always passes Memtestx86 (The passmark version btw) then what else could it be?

As I said the system was working fine for a year with this XMP-3200 profile it always used. It worked reliably with no crashes, so why would that change other than something simply went faulty? I really don't think its the memory but if you think otherwise, feel free to tell me.

What a troublesome system...
 
Well, if it isn't the memory, then it's got to be either the board or the CPU. Nothing else that would cause a "RAM overclock" error.

This is what I think too. But as I said, I already changed the motherboard, and its dead again? It just seems unfathomable to me. What are even the chances?

You know, that previous Gigabyte board that died? I sold it at eBay as faulty just to make back some cash from that replacement. And the guy left me some feedback saying that it worked perfectly, which was big confusion time.

But that specific board had a habit of POSTing succesfully one month, and then unsuccessfully the next. Its just that it went completely dead when I sold it with no signs of reviving itself like it was doing before. Aside from that, however, it was also freezing in similar ways to the new board. So, I wonder if I had two faulty components at the same time? The CPU was causing the crashes, and the motherboard was causing those random days where it would refuse to POST (but then POST the next day without me doing anything).

I swear to God, if I have a faulty CPU, I'm going to throw that entire computer out the window and forget about it. I'm going back to Sandy Bridge!

Seriously though, what are even the chances that the CPU is dead? And there's no way to test that without trying another CPU, given that the problem is intermittent.

Despite the theory, when probabilities of failure are considered, I think that the motherboard simply went bad again. CPUs don't die. 40-year old Intel 8088's still work and are sold and bought. I've never seen a dead CPU and I have a box full of them.

Unless, Alder Lake bending issues...?
 
Most problems on PC hardware are straight up due to user error. Either because they did something wrong during the installation or because something is not configured properly. But failures that have nothing at all to do with user error happen all the time. Every day. All day long. These are electronics and electronics tend to fail. Especially power supplies, graphics cards and motherboards, but also drives, fans and in some cases CPUs.

CPUs that did not have a problem right out of the box (Which also happens, but it's very infrequent) usually don't tend to just "go bad" later on unless the user has abused it in some way by dropping it during maintenance (Changing coolers, paste, etc.), or configuring an unrealistic overclock, using too much voltage for too long which can cause electromigration and VT Shift as the CPU degrades from the overclock or increased voltage, or something like moisture related problems especially liquids spilled on the tower that get inside and tend to short things out.

Other things though can cause CPU failures. A bad motherboard or memory, or really, ANY bad component that is shorted or has a similar issue can fry a CPU if it is pulling too many amps through the CPU.

Older CPUs, for the most part, are no more resistant or prone to failure than newer ones, and visa versa. If you have a bad CPU, then it's either bad because of something you did or just plain bad luck. For one thing, this is a reason I will not buy one of the "F" model CPUs. They are samples where there was some kind of problem with the onboard graphics so that is disabled and it is sold as a non-iGPU model, but it still means there was something wrong with it in the beginning as they don't generally disable the graphics on normally working samples. So to me, I just think they are inferior models to begin with but plenty of people run them so maybe that's untrue, still the cost is very similar as models WITH graphics and having graphics for the purpose of troubleshooting when there is a display issue or simply the ability to use the CPU if you don't have a graphics card makes it more than worth it to me.

All that being said, CPUs CAN have problems or fail, it's just not common. I'd pull the board out of the case, set it up with minimal hardware connected on a piece of cardboard or the box the motherboard came in and look for problems while you set it up for bench testing, then run it that way for a few days to see if you still have a problem. I'd definitely get some paste and pull the CPU and cooler to make sure there are no problems there. Even ONE bent pin on the motherboard or bad contact pad on the CPU can cause problems and sometimes it doesn't happen immediately but over time the normal vibrations the system is subjected to from fans and drives or moving the case a bit occasionally for cleaning or other maintenance can cause an issue that was only borderline problematic to change, and become a bigger issue.

We see this sometimes with people who don't pay appropriate attention to where the standoffs are preinstalled in the case and mounting the motherboard anyhow, so that maybe there's one standoff in a location there shouldn't be one for that form factor of motherboard. Now, in some cases this might not cause problems right away because of the layers used to protect some of the electrical traces and the conformal coating used to protect the whole board, but in time that standoff can wear through such things and eventually (Or immediately) come into contact with one of the traces or some other component of the board causing a short or other problems and depending on what is being made contact with and how, it could potentially also come and go intermittently. I'm not saying this IS your problem, just that this is one way in which some types of problems that might fall under the type of intermittent umbrella that you problem does, can happen.

Since you had the same problem with another board, it makes me very skeptical that is the problem. I'd look at ONLY hardware that was used on both builds, so CPU, drives or improper installation. Even sometimes I've seen problems with the mini board on the front case I/O panel cause problems.

 
The minimalist approach is a good idea. Only problem being is that this is my brother's build, so I can't exactly afford to take it away from him for days just to test it. Testing it for just a couple of hours is already a challenge. And he's kinda angry with those whole situation. I mean, he trusted me and I failed pretty spectacularly. I never thought that this would happen. I've done countless builds and I always figure the problem out. That said, the fact that the system isn't always available for testing might contribute to that.

Anyway. We're both students abroad and we're away for the summer, so for now I can't really test the system no matter what I do because it isn't with him or with me. But once we return in September, I'm thinking I should just buy him a new system and I take the old one. That way I receive an upgrade and I have much more time to figure out what on this earth is wrong with it. Right now, I have a Ryzen 5 2400g system (which I put together myself) and its falling behind times... I've been thinking its time to move on.

CPUs that did not have a problem right out of the box (Which also happens, but it's very infrequent) usually don't tend to just "go bad" later on unless the user has abused it in some way by dropping it during maintenance (Changing coolers, paste, etc.), or configuring an unrealistic overclock, using too much voltage for too long which can cause electromigration and VT Shift as the CPU degrades from the overclock or increased voltage, or something like moisture related problems especially liquids spilled on the tower that get inside and tend to short things out.

I definitely neither overclocked or dropped the CPU. Hell, my H610M motherboard isn't even allowing me to keep a constant boost because it has a power limit on the PL1. How stupid. This is why I dislike Intel. But anyway, the point is, I didn't mess with it, and it isn't overclocked. It hasn't even ran at its maximum stock clock possible. Thanks again Intel! (My Coffee Lake laptop has the exact same problem).

Older CPUs, for the most part, are no more resistant or prone to failure than newer ones, and visa versa. If you have a bad CPU, then it's either bad because of something you did or just plain bad luck. For one thing, this is a reason I will not buy one of the "F" model CPUs. They are samples where there was some kind of problem with the onboard graphics so that is disabled and it is sold as a non-iGPU model, but it still means there was something wrong with it in the beginning as they don't generally disable the graphics on normally working samples.

Huh. Wild. I didn't know this. A non-F version was not much more expensive but I didn't think I need it. Now I do regret it. But once I do bench the system, I have a known-good GPU to use, so no problem.

What you're saying makes sense... but you'd also think that Intel rigorously tests their CPUs to make sure they're okay. Then again, what do I know, right? I'm just speculating.

We see this sometimes with people who don't pay appropriate attention to where the standoffs are preinstalled in the case and mounting the motherboard anyhow, so that maybe there's one standoff in a location there shouldn't be one for that form factor of motherboard. Now, in some cases this might not cause problems right away because of the layers used to protect some of the electrical traces and the conformal coating used to protect the whole board, but in time that standoff can wear through such things and eventually (Or immediately) come into contact with one of the traces or some other component of the board causing a short or other problems and depending on what is being made contact with and how, it could potentially also come and go intermittently. I'm not saying this IS your problem, just that this is one way in which some types of problems that might fall under the type of intermittent umbrella that you problem does, can happen.

I paid attention to where the stand-offs where, so no worries there. I do soldering though and sometimes I run wires to broken traces by scraping them to expose the copper. That takes a lot of effort but I guess it could happen if there is enough force between the stand-off and trace.

Since you had the same problem with another board, it makes me very skeptical that is the problem. I'd look at ONLY hardware that was used on both builds, so CPU, drives or improper installation. Even sometimes I've seen problems with the mini board on the front case I/O panel cause problems.

Yeah, I'll do that once I get the chance. I just hope that the CPU isn't bad as it wasn't exactly cheap and its now out of warranty.

You mention the front IO being bad. My USB-3 front IO never worked despite being connected on both boards. Now I wonder if that has anything to do with it. It's a good high-quality case though from a reputable brand so I wouldn't exactly expect it.
 
Just because the case brand is reputable doesn't mean they couldn't get a bad I/O board from whoever supplies it to them. They certainly don't manufacturer it themselves. And again, ANY electronic component can have problems or be faulty, and sometimes it's because of something that happens AFTER it has left the manufacturer. Like, crushed pallet, box falls from back of truck six feet to the ground and then is simply put back with no regard for whether it might have been damaged or not. Things like that. It happens.
 
Just because the case brand is reputable doesn't mean they couldn't get a bad I/O board from whoever supplies it to them. They certainly don't manufacturer it themselves. And again, ANY electronic component can have problems or be faulty, and sometimes it's because of something that happens AFTER it has left the manufacturer. Like, crushed pallet, box falls from back of truck six feet to the ground and then is simply put back with no regard for whether it might have been damaged or not. Things like that. It happens.

Now I'm wondering whether it had a short and it was slowly killing my mainboards 0.0

Well, probably not? I'd have to test it once I go back. If there's a short it should be measurable. That's the immediate danger.

I really doubt it though. If there was a VCC short to ground Windows lets you know... usually. I know from personal experiments where I drew too much current from the USB ports. Not from this PC but another one.

Anyway, I shall test it bare with the most minimal components. It'll be really funny if its the case though. That would be wild.

Thanks for sticking to my complicated thread. I won't be able to try anything until September.

But I suspect that if I keep the build, I'll swap the mainboard for one that allows you to unlock the PL1 limits. I still find it extremely stupid that the mainboard is holding back my CPU. Again, I want to go back to Sandy Bridge! I'm going to figure out the problem first though. I'm not about to get yet another motherboard without first being fully confident about the actual problem.