Question Possible hard drive failure - second opinion?

gc123

Commendable
Sep 21, 2020
7
0
1,510
This all happened yesterday:
  • In the morning, the DS4Windows icon on my taskbar had disappeared and it said the executable had been deleted. I thought Windows Defender may have gobbled it but this was not the case. Wasn't overly concerned, reinstalled it without much incident.
  • I installed a Steam game and played it for a few hours, closed it, then came back to it to find it crashed to a strange error that I couldn't find anywhere. I tried verifying the installation but it kept complaining of "Corrupt Disk", and indeed a few folders in the game directory were corrupted and could not be opened. Had no luck reinstalling onto the hard drive, installed onto my SSD just fine.
  • Tried opening Game 2, Steam complains missing executable, game has totally disappeared leaving just a .vdf file and alarm bells start ringing in my head.
  • Check Game 3, huge portion of game files missing. Start to suspect issue with drive. A few games had disappeared altogether taking the whole folder with them. Game 4 also had substantial portions of their files missing.
  • Massive disparity in amount of hard drive space available. Windows Explorer said 1.2TB were occupied right clicking on D: drive, but 800GB filled when I highlighted the entire contents of the drive. This seems about right for the number of files that disappeared.
  • Ran chkdsk /f /r, about 11,000 orphaned files, some recovered but many of them didn't. chkdsk told me it moved the files it couldn't place in \found.000, this folder only had 4GB worth of stuff and was nowhere near as much as was still occupied. (should've been like, a hundred times that...) chkdsk flagged no bad sectors, just a bunch of corruptions and missing records. Game 1's folders were now fully accessible, many still disappeared.
  • Ran sfc scannow and DISM, no issues with Windows.
  • SMART indicators all nominal. Reallocation count zero. Shutdown and went to sleep.
  • Woke up to find Game 4's folder was gone entirely, Game 3, which I had previously reinstalled shortly before the checkdisk scan was missing its executable. (though I didn't check whether it actually had successfully reinstalled after the chkdsk) Decided to do a full reformat of the drive since I couldn't get rid of the 400GB of space otherwise. Reformat concluded with no bad sectors flagged, full space available. I'm no expert with hard drives but if it was a file system issue, a reformat may have resolved it?
It was suggested to me that this was a software issue, and that there is no indication of a faulty drive. (I was under the impression randomly disappearing files was a sign of a dying drive) I'm about to reinstall Game 3 and see if it survives a few reboots. The drive has suffered trauma months ago but no issues came as a result.

This drive is used entirely for games and all non-replaceable data is frequently backed up by Steam Cloud and such so thankfully I lost nothing. However I'd like to know whether I should just replace the drive (it was a prebuilt, would rather just pay for a new drive than send the whole system away for a month - keeping the original drive in case I have to RMA it for something else), whether it's an indication of some larger issue with my motherboard (my main issue is if my SSD could be corrupted, though I do regularly backup so this wouldn't be a total travesty), power supply or Windows installation, or whether I should just continue per normal unless things start to go awry again.

MBAM scan clear, doesn't seem like malware.

Sorry for the long post, opinions appreciated.
 
SMART indicators all nominal. Reallocation count zero. Shutdown and went to sleep.
Sounds like the drive itself may be fine. It's still a good idea to share all the smart-data anyway.

Anyway check this:
  • See if the sata cable is properly connected. Also try to replace the cable.
  • Do a HDD stress test and see if it pass. You can use a program such as HD tune for that. Or you can run a Linux distro offline from dvd or usb stick and use the Disks utility (you should find this on most major distributions such as Mint, Ubuntu, etc).
  • Also you may want to check RAM is consistent, using Memtest86+. If memory get corrupt, it may potentially cause corrupted data written to disk.
 
  • Like
Reactions: gc123

gc123

Commendable
Sep 21, 2020
7
0
1,510
Sounds like the drive itself may be fine. It's still a good idea to share all the smart-data anyway.

Anyway check this:
  • See if the sata cable is properly connected. Also try to replace the cable.
  • Do a HDD stress test and see if it pass. You can use a program such as HD tune for that. Or you can run a Linux distro offline from dvd or usb stick and use the Disks utility (you should find this on most major distributions such as Mint, Ubuntu, etc).
  • Also you may want to check RAM is consistent, using Memtest86+. If memory get corrupt, it may potentially cause corrupted data written to disk.

Thanks very much for the reply. This is very reassuring. These are the SMART attributes from CrystalDiskInfo:

-- S.M.A.R.T. --------------------------------------------------------------
ID Cur Wor Thr RawValues(6) Attribute Name
01 100 _48 __6 0000000005BD Read Error Rate
03 _98 _98 __1 000000000000 Spin-Up Time
04 _99 _99 _20 000000000444 Start/Stop Count
05 100 100 _10 000000000000 Reallocated Sectors Count
07 _78 _60 _45 000003AD903A Seek Error Rate
09 _95 _95 __1 EA5A0000120D Power-On Hours
0A 100 100 _97 000000000000 Spin Retry Count
0C 100 100 _20 0000000003DF Power Cycle Count
B4 100 100 __1 0000506E83AE Vendor Specific
B7 100 100 __1 000000000000 Vendor Specific
B8 100 100 _97 000000000000 End-to-End Error
BB 100 100 __0 000000000000 Reported Uncorrectable Errors
BC 100 100 __0 000000000000 Command Timeout
BD 100 100 __1 000000000000 High Fly Writes
BE _65 _52 _40 00002C170023 Airflow Temperature
BF 100 100 __1 000000000000 G-Sense Error Rate
C0 100 100 __1 000000000008 Power-off Retract Count
C1 _83 _83 __1 000000008752 Load/Unload Cycle Count
C2 _35 _48 __1 001100000023 Temperature
C3 100 _64 __1 0000000005BD Hardware ECC recovered
C4 100 100 _10 000000000000 Reallocation Event Count
C5 100 100 __1 000000000000 Current Pending Sector Count
C6 100 100 __1 000000000000 Uncorrectable Sector Count
C7 200 200 __1 000000000000 UltraDMA CRC Error Count
F0 100 253 __0 C9F4000006AA Head Flying Hours
F1 100 253 __0 000258E0D193 Total Host Writes
F2 100 253 __0 0007BD4E483D Total Host Reads

Tonight I will reinstall a game and see if it survives a reboot.
  • Will check this tomorrow. Will also try swapping the SATA ports on the board to see if that could be the culprit.
  • I'm a bit put off by the fact that HDD stress tests can shorten the lifespan of a drive. Since the free test is only read only will this be ok?
  • Ah that's fair. I would have thought that if it was dodgy RAM, that some files on my SSD would also be getting corrupted (particularly since it's being rewritten or at least read almost constantly as my OS drive) but all the corruption was on my HDD - but for fear that I could start to have problems with my SSD I'll do this!
 
Last edited:

gc123

Commendable
Sep 21, 2020
7
0
1,510
Sounds like the drive itself may be fine. It's still a good idea to share all the smart-data anyway.

Anyway check this:
  • See if the sata cable is properly connected. Also try to replace the cable.
  • Do a HDD stress test and see if it pass. You can use a program such as HD tune for that. Or you can run a Linux distro offline from dvd or usb stick and use the Disks utility (you should find this on most major distributions such as Mint, Ubuntu, etc).
  • Also you may want to check RAM is consistent, using Memtest86+. If memory get corrupt, it may potentially cause corrupted data written to disk.

Small update:

Did more digging and the first report of any file system corruption was weeks ago, with games I haven't played for ages. It did not seem to come after an improper shutdown or noticeable failure. All normal, then the errors started. It had been silently trying to correct file system issues since then. I do distinctly remember at startup one time it checking the D: drive but I thought nothing of it. I am hoping the file system issues won't reoccur after this reformat.

I haven't played any games yet (only got to the menu of them to check that the files it is installing are not corrupt) but I've installed 3. No CHKDSK events in Event Viewer, no disappearing files, but I am only using a tiny proportion of the drive atm. (65GB) Seems very hopeful so far.

SATA cable is in properly. I don't have a USB pen drive on hand (just a backup drive) so I couldn't use Memtest86+ but I ran HCI Memtest on 6 threads for ~2.5 cycles (took an hour or so) and had no errors. Reassured by the fact that my SSD is in constant use and has no corruption, that RAM seems clear?

A bit apprehensive about doing a HDD stress test still. Does this all sound fine or is there anything I should be concerned about/look out for? (the principal worry is the issue not lying with the hard drive itself)
 
I do distinctly remember at startup one time it checking the D: drive
This happens if - at windows boot - windows doesn't find a flag (I don't remember the correct term right now) that is set to confirm a clean shutdown (i.e. may be files that windows had not get time to write to disk either due to sudden loss of power or some other software freeze followed by reboot). So rather than give a tell-tell sign of a failed drive, it just say that windows did not shut down properly last time. Really the same as you say just a little before, I just fill in.


I couldn't use Memtest86+ but I ran HCI Memtest on 6 threads for ~2.5 cycles (took an hour or so) and had no errors.
About that - when such program run on top of windows, there will be areas in memory it cannot test - because Windows use that space.
So I have heard at least.

A bit apprehensive about doing a HDD stress test still. Does this all sound fine or is there anything I should be concerned about/look out for? (the principal worry is the issue not lying with the hard drive itself)
This argument I don't understand. If a HDD stress test result in a defective hdd, then the drive was already bad.

Still - if you're worry the disk is due to fail, then you should backup all your data. Then, it doesn't matter if the hdd does die as you still have your data intact by backup. In fact, I'll claim that is the best possible outcome because you'd have found the problem, and second - you get to buy a new drive and less chance for loosing recently saved (i.e. not baked up yet) data.
 

gc123

Commendable
Sep 21, 2020
7
0
1,510
This happens if - at windows boot - windows doesn't find a flag (I don't remember the correct term right now) that is set to confirm a clean shutdown (i.e. may be files that windows had not get time to write to disk either due to sudden loss of power or some other software freeze followed by reboot). So rather than give a tell-tell sign of a failed drive, it just say that windows did not shut down properly last time. Really the same as you say just a little before, I just fill in.


About that - when such program run on top of windows, there will be areas in memory it cannot test - because Windows use that space.
So I have heard at least.

This argument I don't understand. If a HDD stress test result in a defective hdd, then the drive was already bad.

Still - if you're worry the disk is due to fail, then you should backup all your data. Then, it doesn't matter if the hdd does die as you still have your data intact by backup. In fact, I'll claim that is the best possible outcome because you'd have found the problem, and second - you get to buy a new drive and less chance for loosing recently saved (i.e. not baked up yet) data.

Fair enough - will try to dig out a USB drive I can do memtest86+ off of just to make doubly sure the culprit isn't faulty RAM.

Yeah to be fair you're right - I'm just concerned about killing it off prematurely if it can be used for a while. (just based off stuff I've read before) The data is very replaceable (just a matter of reinstalling a game/restoring saves from cloud) so there's no real concern of data loss, just inconvenience really.

Been playing games and there have been no issues, so fingers crossed! Thanks for the help. Hopefully it was just some singular freak incident weeks ago that caused these issues.
 

gc123

Commendable
Sep 21, 2020
7
0
1,510
This happens if - at windows boot - windows doesn't find a flag (I don't remember the correct term right now) that is set to confirm a clean shutdown (i.e. may be files that windows had not get time to write to disk either due to sudden loss of power or some other software freeze followed by reboot). So rather than give a tell-tell sign of a failed drive, it just say that windows did not shut down properly last time. Really the same as you say just a little before, I just fill in.


About that - when such program run on top of windows, there will be areas in memory it cannot test - because Windows use that space.
So I have heard at least.

This argument I don't understand. If a HDD stress test result in a defective hdd, then the drive was already bad.

Still - if you're worry the disk is due to fail, then you should backup all your data. Then, it doesn't matter if the hdd does die as you still have your data intact by backup. In fact, I'll claim that is the best possible outcome because you'd have found the problem, and second - you get to buy a new drive and less chance for loosing recently saved (i.e. not baked up yet) data.

Sorry for the bump.

Precisely the same thing just happened again after weeks of perfect use, 7GB disappeared, now a load of Chkdsk events.

Should I look to RMA the drive?

Edit: Warranty actually expired a week before my original post, so will have to buy a new drive.
 

USAFRet

Titan
Moderator
Sorry for the bump.

Precisely the same thing just happened again after weeks of perfect use, 7GB disappeared, now a load of Chkdsk events.

Should I look to RMA the drive?

Edit: Warranty actually expired a week before my original post, so will have to buy a new drive.
Try the RMA anyway.

I had a Sandisk SSD die 33 days past the 3 year warranty.
I knew it was past, they knew it was past...then gave me a new drive anyway.
 

DSzymborski

Curmudgeon Pursuivant
Moderator
Yeah, always open an RMA or contact customer service regardless of your warranty. The warranty is a contract of when they're required to offer you a replacement, there's no limitation on what they can voluntarily offer you past that point. And companies frequently (though obviously not always) offer replacements for things past the warranty date. Though you had a better chance if you had done this at the time instead of waiting an additional month.
 

gc123

Commendable
Sep 21, 2020
7
0
1,510
Try the RMA anyway.

I had a Sandisk SSD die 33 days past the 3 year warranty.
I knew it was past, they knew it was past...then gave me a new drive anyway.
Yeah, always open an RMA or contact customer service regardless of your warranty. The warranty is a contract of when they're required to offer you a replacement, there's no limitation on what they can voluntarily offer you past that point. And companies frequently (though obviously not always) offer replacements for things past the warranty date. Though you had a better chance if you had done this at the time instead of waiting an additional month.

The problem is - this is a prebuilt and if they require sending the whole unit away, I'll probably just buy a new drive. Since I do work on this machine and don't want weeks downtime. The website seems to be down atm but from the wording, it does sound like they'd require sending the entire unit - and they'd probably charge for labour and I don't want to pay an extortionate price for some guy sliding a hard drive out of a drive ccage.

Then I'll just have to have fingers crossed this doesn't happen again. Must be a faulty drive at this point, right?