Repeated OS Corruption - Hardware Issue?

Michael Paulmeno

Honorable
Aug 30, 2013
86
0
10,640
I am in the process of building a home server for myself. Thus far I have succeeded in crafting a system with the following specs and getting it to POST:

Supermicro X10-DRL-i motherboard
1 Intel Xeon E5-2603 v3 CPU
16GB DDR4 server RAM (1 Crucial stick)
2 Crucial MX200 250GB SSDs

The above is located in a case recycled from an older system with an ~750 watt power supply (original to the case) plus a DVD drive. The hard drive cage has six bays and there is a hot-swappable backplane to which the drives attach. I also have a WD SE 2TB hard drive which has not been installed.

The story which has led to my current predicament is as follows. After putting everything together, I decided to use the on-board RAID controller to create a RAID 1 array with the drives. However after successfully installing Windows Server 2012, the RAID array collapsed. I rebuilt it and reinstalled Windows, but the problem persisted. After losing the array two more times I decided to give up on RAID. The on-board Intel firmware RAID controller has a poor reputation anyway so I decided to simply use one drive with the second as a cold spare. The OS actually survived my deleting the array, but was horribly slow and buggy every time I logged in so I used PartedMagic to securely erase the SSDs and started from scratch.

Everything worked fine at first. Windows installed (albeit with some complaints about not being able to format the volume on the disk - apparently I don't know how to install Windows correctly), I set up NIC teaming, set the host name, and started Windows update. The update (a new version of the update agent) failed. Control Panel and the Windows search feature began working only slowly. I tried sfc, clean boot, check disk, and optimizing the hard drive. Nothing helped. Eventually check disk failed with the following message:
https://mjpscreenshots.s3.amazonaws.com/check%20disk%20error.jpg

Now after POST, the computer does an "Automatic Repair" and then reboots.

Does anyone have any insights? It can't be the RAM. I ran Memtest86, and it passed all 13 tests for all four passes (for 48 tests total). The computer sits underneath a Dell PowerEdge R210 I am using as a Firewall device and the room it's in runs about 77°. However all the case fans work fine. Otherwise I can't think of any other source of trouble except bad hardware.
 
You're trying to do too many things at the same time.
Undo the RAID, and finally got it running but not right. ANd immediately go into NIC pairing.

Get a clean working OS install first. Then add the fancy bits.

Wipe the drive completely. All of it.
Reinstall the OS. Get that running 100%.
Then move on.
 
I think MX200's have an update for their Firmware Try doing that first.
Update the BIOS of the Server mobo perhaps its not playing nice with part of the hardware config.

Make sure you are deleting the partitions on the drives so everything shows as unallocated then format again.
 


Yeah I am thinking the same thing. Either that or the SATA backplane in bad/incompatible with my motherboard. It is ten years old (more or less - it too is original to the case). Just now I tried to install Windows, but the process failed. The installer said "Windows cannot be installed on this disk". After two attempts I managed to create enough partitions, but the installer wouldn't format the primary partition and returned error 0x80070057. Interestingly this happened on both disks. Could both really be bad?
 
I hooked the SSDs up to a SATA-to-USB device (I am short on Windows computers). According to CrystalDiskInfo both drives are fine. Crucial's own Storage Executive would not pick them up. Since both CrystalDiskInfo and Diskpart detected them I am going to chalk it up to the program not playing well with the external hook up. My next step is going to be to put these drives in a different computer and see if I can install Windows.
 
I *think* I solved the problem. After hooking the SSDs directly to the motherboard did not help, I decided to try the Diskpart clean command. One drive (the one I had been using to install Windows on incidentally) failed with an "I/O device error". It may have an issue. The other went fine and I was able to install Windows Server 2012 on it. I'm going to do some further testing before proceeding to set up my server, but hopefully the problems are under control.
 


Did you ever do a firmware upgrade on them?
 


Yeah, but it didn't help. I still ran into the same issues even after upgrading the firmware. Until Windows installed my fear had been the Crucial SSDs were incompatible with the Supermicro motherboard. Curiously the server would not boot to the CD I burned to driver iso to. The upgrade only worked when the drives were plugged into another machine.

In any case with Windows now installed on my server I can download Crucial's Storage Executive and other utilities to examine the drives while they are in the machine. As I said I am short on Windows computers at home (the USB-to-SATA device mentioned above is at work) so being able to work on them while they're in the server makes things easier. If the second one still won't format it might have gone bad at which point a call to customer service will be in order.

 
According to Crucial the SSDs are incompatible with my motherboard. I didn't think it made a difference, but perhaps it does. Today I discovered the working drive is throwing Event ID 153 errors. So perhaps the drives don't play well with the board. I do have an WD SE hard drive, but according to Supermicro it is not compatible either. I'm going to weight my options and decide whether to risk continuing or sell the working parts on eBay.