So I recently built a new computer and while putting it together ran into a few issues. First, I couldn't get any information on screen/mobo to POST... So I swapped out the video card and I could get it to POST/get to the BIOS, so I realized it was a bad video card and thought I was all set. So, put in a new video card, then installed Ubuntu 22.04 to make sure the system just generally worked/ran and to start doing some stress testing to make sure all the other components were good. While doing this I started getting pretty random but frequent crashes. Specifically, the screen would just turn off/turn black and the computer would become totally unresponsive- I couldn't even power it down by holding the power button for 15+ plus seconds. The only way to shut it off was to flip the PSU switch or physically unplug it (hate doing that).
After this, once plugged back in/turned the PSU power switch back on, it would start up again and run seemingly fine... until at some random interval (usually within an hour) it would crash the same way again. The crashing had nothing to do with stress tests- it would happen regardless of if I was doing a test or if I was just letting the computer sit idling. After a bunch of troubleshooting and stress testing, I verified that the storage, CPU, RAM and PSU all seem to be good (even bought a PSU tester and verified all the voltages and ms were good). So, I'm left thinking the motherboard is bad.
When testing the RAM, what I did was (as usually recommended) I only ran one stick at a time and used PCMemTest-64 v1.5 to check for any issues. (I also tried stress-ng at one point as well, with no issues). When I did this, I ran them all in slot A2 (A2 and B2 are recommended by the Mobo manufacturer for dual channel, which this was a dual channel 3600 kit). I ran everything at stock settings both with XMP enabled and disabled, and I never got any errors on either RAM stick in either case after running 8+ passes.
I then tried running the computer with RAM in only the A2 slot (so not using both sticks) and everything was golden- no more crashes for 6+ hours (again, it would pretty reliably crash within an hour or less before) and all the stress tests passed/were good. Then I moved the single stick of RAM to slot B2 and boom- almost immediate crash. So, I thought- "OK slot B2 is bad, it is a faulty motherboard"... but this is where things get weird. I then ran MemTest with both sticks of RAM in (in A2 and B2) and get no errors for 8+ passes. Then, I take the RAM out of slot A2 and *only* have it in slot B2... again no errors or crashes from MemTest for 8+ passes.
So there's my question- if the motherboard is bad, wouldn't there be issues/errors when I run the RAM in B2 regardless if it's running the OS or running MemTest? Both should produce some sort of error/problem if the motherboard B2 slot is defective, right? So if I *never* get any issues with the RAM running in B2 for MemTest, is it possible that the problem is actually some sort of hardware incompatibility with the operating system (again, Ubuntu 22.04)? I guess I thought there would be some sort of error message or something if the kernel was crashing or something, but again I never get anything like that- the whole thing just locks up and goes black.
Anyway, I'm looking for whether anyone has ever encountered something like this, and what the best route forward may be? I was planning to eventually install Windows 11 on this computer- should I just go ahead and do that and see if it crashes? Or is it definitely a motherboard issue, and MemTest somehow 'sidesteps' that? (or perhaps 8+ passes isn't enough, even though the computer crashes typically within minutes when I'm in Ubuntu?)
Thanks for any help that can be provided. Computer specs below:
Mobo: GIGABYTE B550M DS3H AC AM4 AMD B550 SATA 6Gb/s Micro ATX AMD Motherboard
CPU: AMD Ryzen 5 3rd Gen - RYZEN 5 3600 Matisse (Zen 2) 6-Core 3.6 GHz, Socket AM4 65W 100-100000031BOX
GPU: PowerColor Fighter AMD Radeon RX 6500 XT Gaming Graphics Card with 4GB GDDR6 Memory
RAM: CORSAIR Vengeance RGB Pro 16GB (2 x 8GB) - CMW16GX4M2C3200C16
PSU: MSI MAG A650GL Gaming Power Supply - Full Modular - 80 Plus Gold Certified 650W
Storage: SAMSUNG 970 EVO Plus SSD 500GB NVMe M.2 2280 SSD
Case: ASUS AP201 Type-C Airflow-focused Micro-ATX,Mini-ITX Computer Case
After this, once plugged back in/turned the PSU power switch back on, it would start up again and run seemingly fine... until at some random interval (usually within an hour) it would crash the same way again. The crashing had nothing to do with stress tests- it would happen regardless of if I was doing a test or if I was just letting the computer sit idling. After a bunch of troubleshooting and stress testing, I verified that the storage, CPU, RAM and PSU all seem to be good (even bought a PSU tester and verified all the voltages and ms were good). So, I'm left thinking the motherboard is bad.
When testing the RAM, what I did was (as usually recommended) I only ran one stick at a time and used PCMemTest-64 v1.5 to check for any issues. (I also tried stress-ng at one point as well, with no issues). When I did this, I ran them all in slot A2 (A2 and B2 are recommended by the Mobo manufacturer for dual channel, which this was a dual channel 3600 kit). I ran everything at stock settings both with XMP enabled and disabled, and I never got any errors on either RAM stick in either case after running 8+ passes.
I then tried running the computer with RAM in only the A2 slot (so not using both sticks) and everything was golden- no more crashes for 6+ hours (again, it would pretty reliably crash within an hour or less before) and all the stress tests passed/were good. Then I moved the single stick of RAM to slot B2 and boom- almost immediate crash. So, I thought- "OK slot B2 is bad, it is a faulty motherboard"... but this is where things get weird. I then ran MemTest with both sticks of RAM in (in A2 and B2) and get no errors for 8+ passes. Then, I take the RAM out of slot A2 and *only* have it in slot B2... again no errors or crashes from MemTest for 8+ passes.
So there's my question- if the motherboard is bad, wouldn't there be issues/errors when I run the RAM in B2 regardless if it's running the OS or running MemTest? Both should produce some sort of error/problem if the motherboard B2 slot is defective, right? So if I *never* get any issues with the RAM running in B2 for MemTest, is it possible that the problem is actually some sort of hardware incompatibility with the operating system (again, Ubuntu 22.04)? I guess I thought there would be some sort of error message or something if the kernel was crashing or something, but again I never get anything like that- the whole thing just locks up and goes black.
Anyway, I'm looking for whether anyone has ever encountered something like this, and what the best route forward may be? I was planning to eventually install Windows 11 on this computer- should I just go ahead and do that and see if it crashes? Or is it definitely a motherboard issue, and MemTest somehow 'sidesteps' that? (or perhaps 8+ passes isn't enough, even though the computer crashes typically within minutes when I'm in Ubuntu?)
Thanks for any help that can be provided. Computer specs below:
Mobo: GIGABYTE B550M DS3H AC AM4 AMD B550 SATA 6Gb/s Micro ATX AMD Motherboard
CPU: AMD Ryzen 5 3rd Gen - RYZEN 5 3600 Matisse (Zen 2) 6-Core 3.6 GHz, Socket AM4 65W 100-100000031BOX
GPU: PowerColor Fighter AMD Radeon RX 6500 XT Gaming Graphics Card with 4GB GDDR6 Memory
RAM: CORSAIR Vengeance RGB Pro 16GB (2 x 8GB) - CMW16GX4M2C3200C16
PSU: MSI MAG A650GL Gaming Power Supply - Full Modular - 80 Plus Gold Certified 650W
Storage: SAMSUNG 970 EVO Plus SSD 500GB NVMe M.2 2280 SSD
Case: ASUS AP201 Type-C Airflow-focused Micro-ATX,Mini-ITX Computer Case