[SOLVED] Memory Errors on a New (to me) Server.. I'm not sure where to turn

Skipple

Honorable
Aug 12, 2014
9
0
10,520
Hi guys,

So I bought a used server on ebay. It's a SuperMicro SYS-6027R-3RF4+. I was planning on using for NAS and Plex server, running a few VMs as well. Planning on running UNRAID for the OS.

When I attempted to launch the UNRAID GUI interface I was getting some kernal errors:
kernel panic - not syncing vfs unable to mount root fs on unknown-block(0 0)

Doing some digging on this error, lot of people trace the issue back to memory issues. Fair enough. Maybe it's faulty memory included with the server I bought.

So I spun up MemTest to see if that threw anything. Immediately threw back mem errors. Okay, so let's start isolating memory modules then.

I test each memory module one by one. Every single one throws errors. Oh damn, okay then maybe there is something wrong with the DIMM slot.

I test the memory in a different DIMM slot. Same thing.

Alright then. Maybe there is an issue with the CPU. I reseat both CPUs, check the pins on the motherboard, they look fine.

Fire it back up, same issue.

Interesting thing is that MemTest is throwing around 300-400 errors at the very beginning of the test, then never throws another error.

I'm not sure where to turn here... Does anyone have any ideas?

Here is an album of the memory errors I was getting on each stick of RAM

Motherboard: X9DR3-LN4F+ Rev 1.01
CPUs: 2x Intel Xeon Evaluation CPU 2680 V1
RAM: 4 x 4GB - PC3 DDR3 - ECC
 
Solution
Because you have a dual slot system, you probably have to put dimms in both sockets, correct? I'm trying to think of any other tests you can do if you don't have another system and the only thing I can think of is to remove a cpu so you can literally test just one module and socket at a time, but I don't think that would help.

What you really need to do is test the memory in another system. Is it ecc reg or just ecc? If it's plain ecc, you can test it in a non-ecc system, but it won't be testing the ecc part which may be the part that is failing. Although it is very odd that all the modules are having the error. That sounds like a cpu/socket/motherboard issue, but without ruling out the ram itself we can't go down that road...
Because you have a dual slot system, you probably have to put dimms in both sockets, correct? I'm trying to think of any other tests you can do if you don't have another system and the only thing I can think of is to remove a cpu so you can literally test just one module and socket at a time, but I don't think that would help.

What you really need to do is test the memory in another system. Is it ecc reg or just ecc? If it's plain ecc, you can test it in a non-ecc system, but it won't be testing the ecc part which may be the part that is failing. Although it is very odd that all the modules are having the error. That sounds like a cpu/socket/motherboard issue, but without ruling out the ram itself we can't go down that road yet.
 

Skipple

Honorable
Aug 12, 2014
9
0
10,520
Because you have a dual slot system, you probably have to put dimms in both sockets, correct? I'm trying to think of any other tests you can do if you don't have another system and the only thing I can think of is to remove a cpu so you can literally test just one module and socket at a time, but I don't think that would help.

What you really need to do is test the memory in another system. Is it ecc reg or just ecc? If it's plain ecc, you can test it in a non-ecc system, but it won't be testing the ecc part which may be the part that is failing. Although it is very odd that all the modules are having the error. That sounds like a cpu/socket/motherboard issue, but without ruling out the ram itself we can't go down that road yet.

Just as an update I managed to figure it out. Turns out a BIOS update made the difference. Went from 3.2 to 3.3 and BOOM. Everything works as normal.
 
Solution

TRENDING THREADS