[SOLVED] Memory channel C not detected while system is warm ?

Status
Not open for further replies.
Feb 13, 2021
3
0
10
I just built a system with the following components:

MSI Creator TRX40 motherboard
Threadripper 3970x
8 x 32GB corsair vengeance LPX memory
Ubuntu 20.04

When I boot the system from a cold state, all the RAM is detected (256GB). When the system is warm, channel C drops out and it will not detect memory in these slots regardless of which sticks I place there, or how much total Memory is installed. I can move modules around (installing the the correct slots), but anything placed in these sockets are not detected. Bios lists 8 sticks of Corsair ram, but only shows 192GB. I don't know if the temperature is just a coincidence, or a cause. The CPU runs at approx 30-35C at idle and 60-62C under heavy load, and does not appear to be overheating. I re-seated the CPU this morning, and could not see any obviously bent pins.

Upon boot, it throws a brief "Memory PMU Training error at Socket 0 Channel 4 DIMM 0 & DIMM 1

I am fairly sure I can rule out bad RAM sticks, as the error is consistently with that socket, not an individual module. Is this sound reasoning?
If not RAM, I think it must be a motherboard issue or a bad CPU. is there anything else that might cause this?
Can anyone help me to diagnose the actual cause? I'm at a loss here, any advise would be hugely appreciated!

Thanks!
 
Feb 13, 2021
3
0
10
This issue turned out to be a bad motherboard. The socket pins look perfect, but the connection with he CPU is bad somewhere in the motherboard. I can't tell if it is really a mis-positioned pin or something else entirely.

The short story is, I narrowed this issue down to either the processor or the mobo. New 3970x processors are expensive, hard to come by (sold out everywhere or drastically marked up as I write this), and not returnable if they are opened. Motherboards are easier to acquire and less investment in trying a new one. The motherboard is also far more likely to be the problem in this case.

And the motherboard was the problem. I swapped it out, and my system runs flawlessly now!
 
Status
Not open for further replies.