SSD disappearing after it's been on a while

Mighty

Distinguished
Jan 22, 2010
94
0
18,660
This is the same system as this thread. Short version: When I swapped to a new motherboard, the first time I turned it on I saw a flash and saw some smoke. I thought I had narrowed the flash down to a stray strand of wire from a sloppy solder job. I looked closely and saw no visible damage to the motherboard.

The system has been running for about six weeks, now. I thought it was pretty stable. I run SETI@Home, so the CPU is maxed out most of the time.

Today, I walked in to a black monitor. A warm reset went through the BIOS logo and ended up back on a black screen.

When I go into the BIOS, the boot drive, the SSD, is not in the list of available devices.. I have to do a cold boot, and then it's visible. I have to set it manually in the BIOS. Then, it boots fine.

It has rebooted twice tonight while I've been using it. It runs for a couple of hours and then Windows crashes. One time, it sat on the blue screen, trying to write a diagnostic file, for a couple of minutes. Then it rebooted. I think the first time it just went straight to the reboot.

I have not let the system cool down between boots. I just power it off for ten secs and then fire it right back up.

My two guesses are: Either the motherboard actually did get damaged in the incident I mention in the other thread. Or, the SSD is dying.

The SSD is an ADATA 512 GB. It's the second one I've had in this machine. It was a warranty replacement installed in Nov '15. The first one lasted about 22 months. This one is about 21 months old. So, that's a little suspicious. Albeit, an SSD is usually an all or nothing thing. This issue is intermittent, so far.

Does anyone have any other ideas on how I can narrow this down? Has anyone actually seen a motherboard lose track of one of the internal SATA drives during a session?

Just happened a third time when I was about to post this. Luckily, I'm typing this up on another machine.

Stop code: Critical_Process_Died. It rebooted before I could double-check that I had copied that correctly.

Thanks,
Drake Christensen
 
Just a follow-up. I stuck the SSD into a SATA-to-USB adapter and ran KeepAliveHD on it. One time it ran overnight. But, Windows did an update and it wouldn't reboot with that plugged in. After I unplugged the adapter and got the pooter rebooted, I fired up KeepAliveHD, again. It ran for about an hour and then the drive disappeared. So, I'm reasonably sure it's the SSD failing.
 
I'm going to open this back up. I plugged the SSD into another machine and imaged it. I was going to send it back for warranty replacement. But, to be thorough, I let it run with KeepAliveHD, just to make sure that it's still failing. It has now run for about 60 hours.

I misspoke above. This drive is only about 12 months old.

Does anyone have any suggestions on how to detect whether this drive really is in the process of failing?

It's an ADATA SX900. I've downloaded the ADATA SSD Toolbox. I captured two screenshots of the S.M.A.R.T. data. Hmm. Don't see the full editing toolbar. They're at
http://www.mightydrake.com/Files/TechSupport/2017-08-14%2018_44_29-ADATA%20SSD%20ToolBox%20SMART%201.png
and
http://www.mightydrake.com/Files/TechSupport/2017-08-14%2018_44_29-ADATA%20SSD%20ToolBox%20SMART%202.png

Some of the field names are a little frightening, but I think the numbers there are not indicative of a major issue. I've read a coupla threads on SMART data, and from what I gather, none of that looks particularly bad.

What else can I be checking?
 
Bump.

It has currently been running about five days. I'm about to leave town for a family reunion along the line of the eclipse. So, that will get it up to about ten days.

Nobody has any suggestions on how to track down this kind of intermittent failure? Is there something else I could be doing which might bring out this behavior, without burning through a bunch of life of the drive?