[SOLVED] Threadripper dead?

Jun 28, 2020
15
0
10
I just assembled my threadripper 1920x, with an S24 cooler and it was working fine. I didn't do any overclock or change any settings on BIOS besides boot order.

Since I wanted to do some heavy stuff I put 8 CPU heavy tasks throughout the night. The next morning the image was frozen but the system was on. After rebooting there is an error 3E on the motherboard Aorus x399.

I tried reassembling it, removing ram and still the same error code on the motherboard.

My only suspicion is that the cooler was not well connected and the cpu could be burnt?

Why didn't the automatic thermal protection kick in and switch off the system before that happen?

Is my cpu dead?
 
Solution
Resetting the CMOS I did with a screwdriver touching both pins... You think removing the battery could make a difference? I didn't change anything on the BIOS besides boot order and it worked fine before that.
To do a proper clear CMOS:
-turn the PC off
-unplug the psu from the wall
-remove the battery from the motherboard
-press and hold the power button on the chassis for 15 seconds. This discharges any leftover power in the capacitors.
-put everything back and try again


Boot order: Was Compatibility Support Module(CSM) enabled?
There's 2 different drivers that run your storage: Legacy, and Windows UEFI.
All my storage drives use the Legacy driver, and if I set CSM to use UEFI driver only, save and exit, the system brings me...
I would suggest reinstalling the cpu itself.
"The TR4 socket can be quite finicky and cause bad connections on all those thousands of pins. I'd try opening the 3 bolt/screws holding the cpu down and then tighten them again.
Oh and this time give each one couple of turns on the threads before you start torquing them. So only do the final tightening on the numbered order."

You may have put too much torque on one end.
 
I would suggest reinstalling the cpu itself.
"The TR4 socket can be quite finicky and cause bad connections on all those thousands of pins. I'd try opening the 3 bolt/screws holding the cpu down and then tighten them again.
Oh and this time give each one couple of turns on the threads before you start torquing them. So only do the final tightening on the numbered order."

You may have put too much torque on one end.

I tried to reposition it again... No luck whatsoever. Sometimes error code 10, then gets back to 3E.

I looked very closely at the CPU pins and it could be my imagination or 1 of them looked a bit brownish... Is it possible to see if something burned inside?

I am a bit desperate now to be honest 🙁
 
So we don't have to look up your mobo manual and find the codes could you copy paste what the manual says each of those codes (3E and 10) indicates?

You may want to change one setting in the BIOS. Which is the ram speed. It may be defaulting to something that is not as friendly to the CPU. Generally lower is better when troubleshooting but AMD CPUs tend to be more picky about ram speed.
 
So we don't have to look up your mobo manual and find the codes could you copy paste what the manual says each of those codes (3E and 10) indicates?

You may want to change one setting in the BIOS. Which is the ram speed. It may be defaulting to something that is not as friendly to the CPU. Generally lower is better when troubleshooting but AMD CPUs tend to be more picky about ram speed.
Apologies...
3E = PCH PEI Initialization
10 = PEI Core is started

Regarding the RAM I cannot change the speed since i cannot login to the BIOS. I don't think it is the memory though but I tried leaving one stick alone without luck.
 
I did try what was suggested to open it and place the CPU back again but still the same.

Why would it work before and stopped working after 12 hours of work?

Seems strange to me. But I did try to reposition it 2-3 times already
 
The light next to the cpu is on (permanent red). And all the fans are on, including those of the GPU. If I take out the GPU, the error is the same. I don't have another GPU but seems unlikely to be related to it.

Nothing appears on screen while switching it on. But the error code says something is wrong... When it worked it was displaying 00 I believe.
 
The light next to the cpu is on (permanent red). And all the fans are on, including those of the GPU. If I take out the GPU, the error is the same. I don't have another GPU but seems unlikely to be related to it.

Nothing appears on screen while switching it on. But the error code says something is wrong... When it worked it was displaying 00 I believe.
The system would definitely fail without a GPU installed.
 
Just double-checking, but you're using the Celsius S24's TR4 mounting bracket, right?

Why would it work before and stopped working after 12 hours of work?
That's what we're trying to find out... unless you happen to have a camera watching the system while you were away.
There's a number of things that could've happened while you were away, but not much comes to my mind, other than:
-Thermal: cpu, gpu, VRM, storage...
-Power: I can't find ANYTHING on that 1600w AFOX psu besides the specs.
 
Just double-checking, but you're using the Celsius S24's TR4 mounting bracket, right?
Yes, used the one from the threadripper box.

That's what we're trying to find out... unless you happen to have a camera watching the system while you were away.
There's a number of things that could've happened while you were away, but not much comes to my mind, other than:
-Thermal: cpu, gpu, VRM, storage...
-Power: I can't find ANYTHING on that 1600w AFOX psu besides the specs.

Yes, thanks I appreciate the help.

The PSU is pretty cheap... You think it may have fried some component of the motherboard or the CPU? I dont know what is PCH or PIE to be honest.
 
when you reseated the CPU did the mounting screws "click" when you tightened them (indicating proper torque)
if so chances are the CPU of board is damaged, and odd question, why would you knowingly use a "cheap" power supply on such a nice system.. it could very well of destroyed the system
 
I don't think the psu did, but I don't have any way to prove it.

10 = PEI Core is started, is some kind of memory error
3E = PCH PEI Initialization, also appears to be memory, or memory controller related

You didn't add on memory, did you?

Have you tried clearing CMOS with just one dimm installed?
Clear CMOS:
-turn the PC off
-unplug the psu from the wall
-remove the silver coin battery
-hold down the power button on the chassis for 15 seconds
-put the battery back in, plug the psu, and turn 'er on
 
when you reseated the CPU did the mounting screws "click" when you tightened them (indicating proper torque)
if so chances are the CPU of board is damaged, and odd question, why would you knowingly use a "cheap" power supply on such a nice system.. it could very well of destroyed the system

To the first question yes, the cpu is properly clicked on the motherboard... I didnt force it but the AMD screwdriver will prevent from making it more tied.

On the second question... You think it may have damaged the cpu or the motherboard?
 
I don't think the psu did, but I don't have any way to prove it.

10 = PEI Core is started, is some kind of memory error
3E = PCH PEI Initialization, also appears to be memory, or memory controller related

You didn't add on memory, did you?

Have you tried clearing CMOS with just one dimm installed?
Clear CMOS:
-turn the PC off
-unplug the psu from the wall
-remove the silver coin battery
-hold down the power button on the chassis for 15 seconds
-put the battery back in, plug the psu, and turn 'er on
I didnt add any memory... I also tried with one stick without luck.

Resetting the CMOS I did with a screwdriver touching both pins... You think removing the battery could make a difference? I didn't change anything on the BIOS besides boot order and it worked fine before that.
 
Resetting the CMOS I did with a screwdriver touching both pins... You think removing the battery could make a difference? I didn't change anything on the BIOS besides boot order and it worked fine before that.
To do a proper clear CMOS:
-turn the PC off
-unplug the psu from the wall
-remove the battery from the motherboard
-press and hold the power button on the chassis for 15 seconds. This discharges any leftover power in the capacitors.
-put everything back and try again


Boot order: Was Compatibility Support Module(CSM) enabled?
There's 2 different drivers that run your storage: Legacy, and Windows UEFI.
All my storage drives use the Legacy driver, and if I set CSM to use UEFI driver only, save and exit, the system brings me right back to bios because it no longer detects my storage drives.
If that were the case with you, it should've brought you right back to bios.
 
Solution