Question Computer randomly freezes and reboots ?

Aug 31, 2024
7
0
10
O Gods and Gurus of Tom's Hardware. I humbly supplicate your help with my current hardware issues :

This computer is now freezing up. Sometimes(very rarely) a BSOD happens. I have managed to get the minidump for 3 of these events(win10) and have included the links.

Specs
MBD: MSI Godlike X570
CPU: AMD Ryzen 5950X
GPU: ASUS TUF 4090
RAM: 64MB (G.Skill F4-4000C 18-32GTZR x 2)
PSU: Superflower 1600w 80+ Titanium Leadex
OS: Win 10/Win 11 Dual Boot

The CPU is cooled with an Arctic Freezer 2 480mm radiator and 2xfans. I have not changed the thermal paste in 4 years but there does not seem to be any thermal issues with the CPU as far as I can see.
I am currently running with an open case. There are plenty fans and airflow. Case is a Fractal Design Meshify 2(?). One of the Meshifys.
Windows 10(old 20H2 I think?) + Windows 11(latest updates/patches as of 02/09/2024 and drivers)
I do not know the BIOS verson but I have not had any need to update the BIOS in many years now. So expect a very old version, definitely from 2021. Given what is happening I am loathe to attempt a BIOS upgrade incase a fault happens mid update and bricks the board.

I have ran memtest86 twice and it did not show any faults after 4 tests(which takes about 8 hours).

I have the latest Windows 11 fully updated with all updates patches and latest version of drivers for all hardware. I also have an old version of Windows 10 on dual boot. They are on separate disks and the fault occurs on both.

For a long time the motherboard has been complaining "Over Current have been detected on USB device, Reboot on 15 seconds to protect your mainboard.". Normally I keep the computer powered for days(weeks, months?) on end as I host a web site and various other reasons. After a few tries the error leaves and the computer boots, works normally up until now. This has been the case for maybe 18 months or so now. I do not know if this fault is related to this issue or a parallel issue. It is a hassle but not a major problem and not worth spending hundreds on a new board to fix if I possibly can avoid it(unless it is the cause of this problem).

This problem seemed to start after I installed windows 11 on dual boot, although I suspect this is a coincidence and not the cause, or the several reboots have exacerbated an existing problem into a crisis.

Most of the time there is no BSOD log but I have managed to capture the minidump on 3 occasions :
https://www.dropbox.com/scl/fi/svnl...6-01.dmp?rlkey=5latzaqpx7mes43hvn26gyhh9&dl=1
https://www.dropbox.com/scl/fi/y0a1...3-01.dmp?rlkey=u9atnwvqfh79zy1lvgdpwb8cv&dl=1
https://www.dropbox.com/scl/fi/a94n...5-01.dmp?rlkey=9y27zxyjal47u7nhe1ipfenlw&dl=1

I have ran HwInfo64, here is the sensor readings from when I started logging to the crash/freeze are here(csv file) : https://www.dropbox.com/scl/fi/ocqz..._log.CSV?rlkey=f3zuv610vngflfgk1zsf5scmo&dl=1

I also ran the other HwInfo program which creates the graphs and did actually manage to catch a freeze event mid, you can see some of the bitmaps are all zero. I include 2 captures, one with the 2 minutes before the freeze and the next for the 2 minutes of the freeze :
https://www.dropbox.com/scl/fi/7387...3h46.zip?rlkey=5gtevy6uiv1zyorub9sv4npan&dl=1

In all cases the logs look ok with no usage or temperature spikes, however I am not a hardware expert, just a boring software guy, and cannot tell if there are abnormal voltages or such as I do not know what the correct baseline is.

I do NOT want to have to buy a new board/cpu and upgrade this now, I absolutely do not fancy the 9950x I really want this to last another year until AMD can come up with something a bit more sexy, so the cheapest possible hardware replacement to get this fixed and going would be ideal. I spend all my time on this and it has not been off since late 2020. I was lucky enough to be one of the first to get my hands on a 5950x so that is the timeframe. So having this not working is a major disruption to me + others who rely on my website(I fell out with my ISP which is why I am self hosting and am NOT going back!).

Any help or good ideas would be greatly appreciated. It lasts only a few minutes now if I can even boot but if there is something I can do to get more diagnostic info or other thing that might help I will make it happen if I can.

Thank you so much for any help or insight you can give for this problem.
 

Lutfij

Titan
Moderator
Windows 10(old 20H2 I think?)
Windows 10 is on 22H2 so you have updates pending if you're on 20H2.

I do not know the BIOS verson but I have not had any need to update the BIOS in many years now.
This is where you get back to us with the BIOS version you're currently on.

Given what is happening I am loathe to attempt a BIOS upgrade incase a fault happens mid update and bricks the board.
1024.png

Your board has a BIOS Flash Button, you're not running aground with a bricked BIOS that easily.

For a long time the motherboard has been complaining "Over Current have been detected on USB device, Reboot on 15 seconds to protect your mainboard."
You might want to relocate to another wall outlet and see if the issue persists. To note, how old is the PSU in your build?
 
Aug 31, 2024
7
0
10
Windows 10(old 20H2 I think?)
Windows 10 is on 22H2 so you have updates pending if you're on 20H2.

I do not know the BIOS verson but I have not had any need to update the BIOS in many years now.
This is where you get back to us with the BIOS version you're currently on.

Given what is happening I am loathe to attempt a BIOS upgrade incase a fault happens mid update and bricks the board.
1024.png

Your board has a BIOS Flash Button, you're not running aground with a bricked BIOS that easily.

For a long time the motherboard has been complaining "Over Current have been detected on USB device, Reboot on 15 seconds to protect your mainboard."
You might want to relocate to another wall outlet and see if the issue persists. To note, how old is the PSU in your build?
The bios version is : E7C34AMS.1D1. Bios build date : 24 Feb 2021.

The PSU is as old as the PC, so about 4 years now.

It has been plugged into different wall sockets, although I have always used one of these multi-plug adapters and it as its own surge protection on it, just to protect it incase of lightning strike. I will try plugging it directly to the wall on another socket without the adapter and see if it makes a difference and report back.

I have all the voltages and the variations in the log files I provided, especially the CSV file from HWMonitor64 it seems very comprehensive. Now I am in the bios I do see VCore and DDR voltage fluctuating slightly. DDR voltage fluctuates from 1.412 to 1.418. VCore fluctuates around 1.47 mark by about 0.002v.
 

ubuysa

Distinguished
I rather think that the over current error you're seeing is because a device plugged into a USB port is trying to draw more current than the port can supply. For USB2 ports that limit is 500mA, for USB3 ports it's 900mA. You should invest in a mains powered USB hub and move the high current draw device to the hub.

The dumps are all DPC_WATCHDOG_TIMEOUT bugchecks that happened because an group of DPC/ISR ran for too long. This may well be related to the over current error. If the USB device can't draw enough current from the port to operate properly then we might expect ISR and DPC errors for that device.
 
Aug 31, 2024
7
0
10
I rather think that the over current error you're seeing is because a device plugged into a USB port is trying to draw more current than the port can supply. For USB2 ports that limit is 500mA, for USB3 ports it's 900mA. You should invest in a mains powered USB hub and move the high current draw device to the hub.

The dumps are all DPC_WATCHDOG_TIMEOUT bugchecks that happened because an group of DPC/ISR ran for too long. This may well be related to the over current error. If the USB device can't draw enough current from the port to operate properly then we might expect ISR and DPC errors for that device.
One of the first things I tried when this error started was to get an externally powered USB hub. It does not help. Unplugging all USB devices does not help either. It seems to need a little time to cool down when the over current error happens persistently then it boots fine, normally if I leave it for 20 minutes. If it persistently wont boot then leaving it does the trick.
 

ubuysa

Distinguished
I would update that BIOS. There have been many updates since the version you have, several with AGESA updates and it's usually important that you install those. The latest BIOS is 7C34v1O, dated 9th August 2024.

These links are a few years old now but they might be relevant. You don't appear to be alone...
https://forum-en.msi.com/index.php?...ce-msi-prestige-x570-creation-bricked.372615/
https://forum-en.msi.com/index.php?threads/godlike-x570-usb-over-current-on-two-systems.345980/
https://forum-en.msi.com/index.php?threads/usb-overcurrent-issue-with-mpg-x570-gaming-plus.361779/

Can you also please upload the kernel dump, it's the file C:\Windows\Memory.dmp.
 
Aug 31, 2024
7
0
10
hi, I have finally managed to upload the MEMORY.DMP file : https://www.transfernow.net/dl/20240906SojGbQpN
I tried to update the BIOS to the most recent version using the MSI.ROM file on a USB stick formatted to FAT32 but I keep getting the Update Error message on the motherboard's screen.

At this point in time I am losing a lot through not having my system running so I need to make a fast decision about what I am going to do.

If it is the board that is faulty then I am most likely better off just getting a new Cheap B550 Motherboard.

CPU/Board/Cooler/RAM/SSD, but I really don't want to spend in buying all of that just to find out it is the PSU that is faulty and I bought a load of update components I don't need. I was thinking if I am going to do it then I may as well upgrade and i'm thinking of getting :

Intel Core i9-14900KS 3.2 GHz 24-Core Processor
ARCTIC Liquid Freezer III 72.8 CFM Liquid CPU Cooler
ASRock Z790 Taichi Carrara EATX LGA1700 Motherboard
Corsair Vengeance 64 GB (2 x 32 GB) DDR5-6600 CL32 Memory
Sabrent Rocket 5 2 TB M.2-2280 PCIe 5.0 X4 NVME Solid State Drive

I do need to move quickly because the longer this PC is down the more money I am losing and at this point I am ready to just throw money at the problem to make it go away. I may get a new case but want to use graphics card, storage and PSU from my existing build which I will cannibalise.

My main fear is that these intel CPUS seem designed to fail as soon as their warranty expires, and I really want this to run for at least 4-5 years. I did not want to do this for another year but it seems the universe had other plans.


It seems to make no sense right now to upgrade as the new Intel chip is likely to drop next month and AMD 9950x3d by years end, so I will go with the original plan and downgrade my system pending new hardware becoming available.

And it will let me know if it really is the board or PSU or what not. I will keep the forum updated. Would still appreciate any advice though I do read it all.

Help or best advice would be great.
 
Last edited:
Aug 31, 2024
7
0
10
Ok I replaced the motherboard with a B550 board and the Over Current Have Been Detected error is gone, but the problem of the freezing and rebooting which is the real issue still persists. I swapped out both RAM modules, still happening so its not the RAM.

That leaves either the CPU or PSU I think as the potential culprit.

And I don't know which. I am guessing more on the lines of the PSU at this point but that is purely a guess.

Is there any way to tell or test if it is CPU or PSU that is going wrong ?
 

DaleH

Notable
Mar 24, 2023
534
57
970
Ok I replaced the motherboard with a B550 board and the Over Current Have Been Detected error is gone, but the problem of the freezing and rebooting which is the real issue still persists. I swapped out both RAM modules, still happening so its not the RAM.

That leaves either the CPU or PSU I think as the potential culprit.

And I don't know which. I am guessing more on the lines of the PSU at this point but that is purely a guess.

Is there any way to tell or test if it is CPU or PSU that is going wrong ?
Easiest and best way is to swap out the PSU with a known good one.
 
Aug 31, 2024
7
0
10
Easiest and best way is to swap out the PSU with a known good one.
I agree but the only other one I have is 20 years old now. I doubt it even has the modern VGA 8 pin connectors. I would need to buy a new PSU and I dont want to do that unless I know for sure it is the culprit.

Is there any way to disable cores on the AMD Ryzen 5950x from BIOS or such that would enable me to perhaps eliminate faulty CPU as the potential problem?

Another option is to open the PSU and see if there is any obvious physical degradation like mushroomed capacitors or such.

It actually should still be under warranty and is a very expensive PSU to replace so I want to get this right.