Nov 1, 2023
8
0
10
Hey folks, just finished my new gaming rig build (specs and software versions [1]), and I'm experiencing disappointing system instability. I'm a longtime computer engineer and deeply technical, but I'm a complete novice when it comes to Windows 11 (or Windows in general, to be honest, I'm a Linux guy). I'm hoping someone can point me in the right direction to even begin to diagnose or narrow down where these issues are originating from, and try to pin point the root cause. I'm not even certain it's a hardware issue, but it definitely seems like one based on my experience.

I'm experiencing random system issues; I've seen a handful of BSOD, random applications will lock up (Not Responding). When I launch Space Marine 2, it crashes pretty much immediately, although not reliably in the exact same place, during the title screen displays. I also seem to be unable to run other games with crashes. It's more stable when I'm just navigating around the desktop.

I have 4 DIMMs right now from two different packages of identical RAM sticks. I'm also running the "Gaming" profile, set inside of my BIOS. I have flashed the BIOS to the latest version available from MSI, which contains the microcode patch that repairs the known issue with 14th gen Intel chips overvolting and causing system instability.

Right now I suspect it's a bad RAM stick (at least I'm hoping that is the issue). Are there logs or events somewhere that might provide some clues for some more directed verification testing? I'm planning to take the A1/B1 sticks out right now to see if that makes any difference, will swap the DIMM sticks and try to see if that results in better behavior. Ultimately I'm going to end up running MemTest on each stick.

Thoughts?

[1] https://privatebin.net/?e5e39d61f68c6845#FmaD1ajSwfLsRdzTNB7bnnpHr3TrX1ASisqxswAdvZAi
 

Lutfij

Titan
Moderator
Note: I believe this to be the Microcode update that addresses the 14th gen overvoltage issue
Correct.

RAM: 4x CORSAIR Dominator Platinum RGB 32GB (2 x 16GB) 288-Pin PC RAM DDR5 6600 (PC5 52800) Desktop Memory Model CMT32GX5M2X6600C32
Can you look at both ram kits and see if you can identify the PCB revision numbers on them?

I have flashed the BIOS to the latest version available from MSI
Did you clear the CMOS after verifying your BIOS was flashed to the latest version? To do so, you should disconnect from the wall and display, then remove the CMOS battery, press and hold down the power button for 30secs, then replace the CMOS battery after 30mins. You could conversely press and hold down the Clear CMOS button on the rear, for 30secs, after disconnecting from the wall and display.

I'm a complete novice when it comes to Windows 11
Did you install the OS in offline mode, later installing all drivers relevant to your platform in an elevated command, i.e, Right click installer>Run as Administrator?
 
Nov 1, 2023
8
0
10
Thank you for the quick reply @Lutfij !

Can you look at both ram kits and see if you can identify the PCB revision numbers on them?

I'll fetch this for you now.

Did you clear the CMOS after verifying your BIOS was flashed to the latest version?

I did not, I didn't know that was required or a best practice. I will do that now!

Did you install the OS in offline mode, later installing all drivers relevant to your platform in an elevated command, i.e, Right click installer>Run as Administrator?

I created the Win11 install USB using Rufus (?), which provided me a few options for the installer. I believe that I installed in online mode, and the machine rebooted several times as it seemed to go through rounds of updates.

Once I was able to log into the Desktop, automatically an MSI launcher was shown that provided me a bunch of options for installation; I don't recall if this installed the drivers or not. To be sure, I went to the MSI support page for the MOBO and downloaded all drivers. I installed/repaired/reinstalled all of them. A couple errored out that I did not expect. Here is the list of completed installs and those that errored (primarily RST & Thunderbolt. The Thunderbolt error code from the log would seem to indicate that it's conflicting with an existing driver that has been installed. It's possible that I have newer drivers installed than those hosted on the MSI support page.

Note: I did attempt to launch SM2 after doing this, and I'm still seeing the same crash dumps.

https://dpaste.com//6AQGLSLMT
 
Nov 1, 2023
8
0
10
Hallmark signs of Intel 14th gen degradation. Sorry man. Start that exchange request immediately.
Kind of what I thought. I want to get it kicked off. Some interesting details: I bought identical components for both myself and my brother in-law in Dec 23. I built his machine then, and he's been using it since. I just built mine. He has had no instability at all, and I'm having the aforementioned issues.

I haven't been following the news closely, what is this process like? Is this a hardware issue and they have issued revisions that avoid the issue, or are they just going to replace it with another vulnerable chip? I purchased the chip through newegg in Dec. What's the current recommended process for claiming a vulnerable chip?

I'm getting random BSOD at this point. The issues are highly variable and not predictable based on load.
 
Nov 1, 2023
8
0
10
Update:
I have a MSI MEG Z690 Ace Mobo, which has a "Clear SMOS" dedicated function. I powered off the machine, disconnected the power cable, turned off the PSU switch, and held the button for 15 seconds as advised. Plugged it all back in and restarted. I was able to actually play ~15m of SM2 until I got a BSOD.
 
Nov 1, 2023
8
0
10
Hallmark signs of Intel 14th gen degradation. Sorry man. Start that exchange request immediately.
That does not make much sense now that I think about it. The first time power ever touched this machine, I updated the BIOS to the latest version from MSI which has the overvoltage patch:

BIOS: 1.I0 (type: UEFI) 7D27v1I, Release date 2024-08-29
Update CPU Microcode 0x129

If we assume that this patch correctly fixes the issue (and maybe that's not true), how would it be possible for this to be my problem? Are there other known issues with 14th gen CPUs with them degrading that *isn't* fixed by this patch?

Also have a major update: I started to run memtest with all 4 DIMMs and it started to throw errors almost immediately. I'm running on individual sticks now, the first stick seems to be okay. I'm unsure what to make of these errors; it talks about expected writes not matching actual writes. Is it possible that this is the result of a bad CPU, or is it safe to assume that I've got a bad stick in the lot?
 
Ram is sold in kits for a reason.
A motherboard must manage all the ram using the same specs of voltage, cas and speed.
The internal workings are designed for the capacity of the kit.
Ram from the same vendor and part number can be made up of differing manufacturing components over time.
Some motherboards, can be very sensitive to this.
This is more difficult when more sticks are involved.
Ram must be matched for proper operation.

You can sometimes compensate for errors by increasing the ram voltage in the motherboard bios if you have a motherboard that permits such settings.

Run memtest86+
It boots from a usb stick and does not use windows.
You can download it here:

If you can run a full pass with NO errors, your ram should be ok.

Running several more passes will sometimes uncover an issue, but it takes more time.
Probably not worth it unless you really suspect a ram issue.
 

awake283

Great
Jun 23, 2024
82
34
60
You can run unmatched sets of RAM. It'll default to the lowest sticks speed but thats it. I've only ran into real issues with this on AM5 systems. Its still a good idea to run through a memtest86+ though.

I still trend towards it being your CPU, but theres no way for us to tell unless you have spare ram and a spare CPU to test with. :\

edit - my fault, just saw you did run a mem test. Now Im thinking it may be a memory channel gone bad in your CPU, or the other guy was right, and there is something that your system doesnt like about the current memory.