Question Computer randomly freezes and reboots ?

Aug 31, 2024
4
0
10
O Gods and Gurus of Tom's Hardware. I humbly supplicate your help with my current hardware issues :

This computer is now freezing up. Sometimes(very rarely) a BSOD happens. I have managed to get the minidump for 3 of these events(win10) and have included the links.

Specs
MBD: MSI Godlike X570
CPU: AMD Ryzen 5950X
GPU: ASUS TUF 4090
RAM: 64MB (G.Skill F4-4000C 18-32GTZR x 2)
PSU: Superflower 1600w 80+ Titanium Leadex
OS: Win 10/Win 11 Dual Boot

The CPU is cooled with an Arctic Freezer 2 480mm radiator and 2xfans. I have not changed the thermal paste in 4 years but there does not seem to be any thermal issues with the CPU as far as I can see.
I am currently running with an open case. There are plenty fans and airflow. Case is a Fractal Design Meshify 2(?). One of the Meshifys.
Windows 10(old 20H2 I think?) + Windows 11(latest updates/patches as of 02/09/2024 and drivers)
I do not know the BIOS verson but I have not had any need to update the BIOS in many years now. So expect a very old version, definitely from 2021. Given what is happening I am loathe to attempt a BIOS upgrade incase a fault happens mid update and bricks the board.

I have ran memtest86 twice and it did not show any faults after 4 tests(which takes about 8 hours).

I have the latest Windows 11 fully updated with all updates patches and latest version of drivers for all hardware. I also have an old version of Windows 10 on dual boot. They are on separate disks and the fault occurs on both.

For a long time the motherboard has been complaining "Over Current have been detected on USB device, Reboot on 15 seconds to protect your mainboard.". Normally I keep the computer powered for days(weeks, months?) on end as I host a web site and various other reasons. After a few tries the error leaves and the computer boots, works normally up until now. This has been the case for maybe 18 months or so now. I do not know if this fault is related to this issue or a parallel issue. It is a hassle but not a major problem and not worth spending hundreds on a new board to fix if I possibly can avoid it(unless it is the cause of this problem).

This problem seemed to start after I installed windows 11 on dual boot, although I suspect this is a coincidence and not the cause, or the several reboots have exacerbated an existing problem into a crisis.

Most of the time there is no BSOD log but I have managed to capture the minidump on 3 occasions :
https://www.dropbox.com/scl/fi/svnl...6-01.dmp?rlkey=5latzaqpx7mes43hvn26gyhh9&dl=1
https://www.dropbox.com/scl/fi/y0a1...3-01.dmp?rlkey=u9atnwvqfh79zy1lvgdpwb8cv&dl=1
https://www.dropbox.com/scl/fi/a94n...5-01.dmp?rlkey=9y27zxyjal47u7nhe1ipfenlw&dl=1

I have ran HwInfo64, here is the sensor readings from when I started logging to the crash/freeze are here(csv file) : https://www.dropbox.com/scl/fi/ocqz..._log.CSV?rlkey=f3zuv610vngflfgk1zsf5scmo&dl=1

I also ran the other HwInfo program which creates the graphs and did actually manage to catch a freeze event mid, you can see some of the bitmaps are all zero. I include 2 captures, one with the 2 minutes before the freeze and the next for the 2 minutes of the freeze :
https://www.dropbox.com/scl/fi/7387...3h46.zip?rlkey=5gtevy6uiv1zyorub9sv4npan&dl=1

In all cases the logs look ok with no usage or temperature spikes, however I am not a hardware expert, just a boring software guy, and cannot tell if there are abnormal voltages or such as I do not know what the correct baseline is.

I do NOT want to have to buy a new board/cpu and upgrade this now, I absolutely do not fancy the 9950x I really want this to last another year until AMD can come up with something a bit more sexy, so the cheapest possible hardware replacement to get this fixed and going would be ideal. I spend all my time on this and it has not been off since late 2020. I was lucky enough to be one of the first to get my hands on a 5950x so that is the timeframe. So having this not working is a major disruption to me + others who rely on my website(I fell out with my ISP which is why I am self hosting and am NOT going back!).

Any help or good ideas would be greatly appreciated. It lasts only a few minutes now if I can even boot but if there is something I can do to get more diagnostic info or other thing that might help I will make it happen if I can.

Thank you so much for any help or insight you can give for this problem.
 

Lutfij

Titan
Moderator
Windows 10(old 20H2 I think?)
Windows 10 is on 22H2 so you have updates pending if you're on 20H2.

I do not know the BIOS verson but I have not had any need to update the BIOS in many years now.
This is where you get back to us with the BIOS version you're currently on.

Given what is happening I am loathe to attempt a BIOS upgrade incase a fault happens mid update and bricks the board.
1024.png

Your board has a BIOS Flash Button, you're not running aground with a bricked BIOS that easily.

For a long time the motherboard has been complaining "Over Current have been detected on USB device, Reboot on 15 seconds to protect your mainboard."
You might want to relocate to another wall outlet and see if the issue persists. To note, how old is the PSU in your build?
 
Aug 31, 2024
4
0
10
Windows 10(old 20H2 I think?)
Windows 10 is on 22H2 so you have updates pending if you're on 20H2.

I do not know the BIOS verson but I have not had any need to update the BIOS in many years now.
This is where you get back to us with the BIOS version you're currently on.

Given what is happening I am loathe to attempt a BIOS upgrade incase a fault happens mid update and bricks the board.
1024.png

Your board has a BIOS Flash Button, you're not running aground with a bricked BIOS that easily.

For a long time the motherboard has been complaining "Over Current have been detected on USB device, Reboot on 15 seconds to protect your mainboard."
You might want to relocate to another wall outlet and see if the issue persists. To note, how old is the PSU in your build?
The bios version is : E7C34AMS.1D1. Bios build date : 24 Feb 2021.

The PSU is as old as the PC, so about 4 years now.

It has been plugged into different wall sockets, although I have always used one of these multi-plug adapters and it as its own surge protection on it, just to protect it incase of lightning strike. I will try plugging it directly to the wall on another socket without the adapter and see if it makes a difference and report back.

I have all the voltages and the variations in the log files I provided, especially the CSV file from HWMonitor64 it seems very comprehensive. Now I am in the bios I do see VCore and DDR voltage fluctuating slightly. DDR voltage fluctuates from 1.412 to 1.418. VCore fluctuates around 1.47 mark by about 0.002v.
 

ubuysa

Distinguished
I rather think that the over current error you're seeing is because a device plugged into a USB port is trying to draw more current than the port can supply. For USB2 ports that limit is 500mA, for USB3 ports it's 900mA. You should invest in a mains powered USB hub and move the high current draw device to the hub.

The dumps are all DPC_WATCHDOG_TIMEOUT bugchecks that happened because an group of DPC/ISR ran for too long. This may well be related to the over current error. If the USB device can't draw enough current from the port to operate properly then we might expect ISR and DPC errors for that device.
 
Aug 31, 2024
4
0
10
I rather think that the over current error you're seeing is because a device plugged into a USB port is trying to draw more current than the port can supply. For USB2 ports that limit is 500mA, for USB3 ports it's 900mA. You should invest in a mains powered USB hub and move the high current draw device to the hub.

The dumps are all DPC_WATCHDOG_TIMEOUT bugchecks that happened because an group of DPC/ISR ran for too long. This may well be related to the over current error. If the USB device can't draw enough current from the port to operate properly then we might expect ISR and DPC errors for that device.
One of the first things I tried when this error started was to get an externally powered USB hub. It does not help. Unplugging all USB devices does not help either. It seems to need a little time to cool down when the over current error happens persistently then it boots fine, normally if I leave it for 20 minutes. If it persistently wont boot then leaving it does the trick.
 

ubuysa

Distinguished
I would update that BIOS. There have been many updates since the version you have, several with AGESA updates and it's usually important that you install those. The latest BIOS is 7C34v1O, dated 9th August 2024.

These links are a few years old now but they might be relevant. You don't appear to be alone...
https://forum-en.msi.com/index.php?...ce-msi-prestige-x570-creation-bricked.372615/
https://forum-en.msi.com/index.php?threads/godlike-x570-usb-over-current-on-two-systems.345980/
https://forum-en.msi.com/index.php?threads/usb-overcurrent-issue-with-mpg-x570-gaming-plus.361779/

Can you also please upload the kernel dump, it's the file C:\Windows\Memory.dmp.