Context
(Note for reading: I often switch the words "drive" and "disk", consider them equal in this post.)
Hardware
I have big archive drives, most are over at least 10 TB Western Digital Gold (some are a bit different).
Everything is in .RAR archive files (because they are testable and repairable).
Each volume has at least a "twin" that is, most of the time, stored in a different building.
Most of the time, drive are stored outside of any computers.
I swap them directly in (non-running) computer in something I name SATA slots, also named hot swap bays or different names... they're the same has a NAS "doors" but in the front of a computer case. (Something like this: https://media.startech.com/cms/products/main/hsb100satbk.main.jpg )
Synchronisation is made by hand, I don't use RAID as to my understanding, it would make drives depend on each other and they would have to stay both together in computer and this would forbid the swapping and the storage of one the drive in a chest in different building and bring other downsides. (To my understanding RAID redundancy is a system resilience solution, not a data archive/backup solution. And I do not want to depend on any archive/backup software.)
Software
(Windows 10, NTFS)
Things I usually do when testing the drives:
What happened
Define, two disks: "disk being tested" and "his twin disk".
Both 12 TB Western Digital Gold, filled approximately: 7TB
Edit: I just noticed right now that "drive being tested" has a "bootTel.dat" file at its root, created this morning (I need to go and can't investigate on what I it is right now).
What I still plan to do
- I will still run a test of all the .RAR files on the "disk being tested" and I expect/hope to find no further errors (this may take something like 24h so I won't be 100% sure before tomorrow).
Edit: I actually started this last test and it seem MANY .RAR files are broken. (Current result: 27 good, 11 bad, for a total of 425 files, I expect that these some files that were inside those folders that were initially "corrupted"... this is gigantic as I would usually expect somethings between 0 and 2, but more often 0, bad files on a whole drive). Despite all the files seeming OK according to WinMerge (fast compare) and drive being good according to Western Digital Dashboard.
So I think I'm going to re-test the other drive (the "twin" one), reformat tested drive and re-copy everything.
My questions
(Note for reading: I often switch the words "drive" and "disk", consider them equal in this post.)
Hardware
I have big archive drives, most are over at least 10 TB Western Digital Gold (some are a bit different).
Everything is in .RAR archive files (because they are testable and repairable).
Each volume has at least a "twin" that is, most of the time, stored in a different building.
Most of the time, drive are stored outside of any computers.
I swap them directly in (non-running) computer in something I name SATA slots, also named hot swap bays or different names... they're the same has a NAS "doors" but in the front of a computer case. (Something like this: https://media.startech.com/cms/products/main/hsb100satbk.main.jpg )
Synchronisation is made by hand, I don't use RAID as to my understanding, it would make drives depend on each other and they would have to stay both together in computer and this would forbid the swapping and the storage of one the drive in a chest in different building and bring other downsides. (To my understanding RAID redundancy is a system resilience solution, not a data archive/backup solution. And I do not want to depend on any archive/backup software.)
Software
(Windows 10, NTFS)
Things I usually do when testing the drives:
- [optional] Check S.M.A.R.T. info with CrystalDiskInfo
- [optional] Check S.M.A.R.T. info with Western Digital Dashboard
- [optional] Run short test with Western Digital Dashboard.
- [optional] Run long test with Western Digital Dashboard.
- Test all the .RAR files
- [optional] Repair broken .RAR files or copy pristine file from the "twin" drive.
- If I have both copies at hand, test disk against twin using WinMerge.
What happened
Define, two disks: "disk being tested" and "his twin disk".
Both 12 TB Western Digital Gold, filled approximately: 7TB
- Tested all .RAR file on a drive (It may be useful to mention: I test files using a home made C# archive testing UI that usually works well. Though, on purpose, it skips (silently I think) folders that cannot be accessed, otherwise it could have some problems with some system folder like "System Volume Information". So, reading further, you will understand that it may have missed some folders that it could not access. But what is certain is that it doesn't write or bring any modifications to the files themselves, it calls the official rar.exe or unrar.exe, it only writes text logs on another drive.)
- All tested files were OK.
- WinMerge detected plenty of differences, folder that were present on the twin drive and not on the drive I was testing.
- I checked, and these folders existed on both drives, BUT on the drive that I was testing, they were saying something like "This folder cannot be access but is [something] or corrupted..." (not the precise message, I didn't make a screenshot)
- I removed the twin copy.
- I started running chkdsk /r on the drive being tested.
- I saw that ETA was 306 hours or something and stopped chkdsk, I thought I should try short solutions first.
- I checked S.M.A.R.T. info with Western Digital Dashboard, they were reported as "Excellent"
- I ran short test with Western Digital Dashboard: no problem found.
- I ran long test with Western Digital Dashboard: no problem found
- I check S.M.A.R.T. info with CrystalDiskInfo too: Good
- I re-inserted the twin copy.
- Compared tested drive and twin with WinMerge again: Not perfect, but much better, much less red in list. And instead of "corrupted folders", it seem all folder can be accessed. But in a few places, WinMerge has detected that some of the .RAR archive are different from one disk to the other.
- Checked those archives that were different (but shouldn't) (usually very large files), I found that, on the disk being tested, those file were zero bytes. They are fine on the "twin disk"
- So I am now repairing the archives by copying a few files from "twin disk" to "disk being tested".
Edit: I just noticed right now that "drive being tested" has a "bootTel.dat" file at its root, created this morning (I need to go and can't investigate on what I it is right now).
What I still plan to do
- I will still run a test of all the .RAR files on the "disk being tested" and I expect/hope to find no further errors (this may take something like 24h so I won't be 100% sure before tomorrow).
Edit: I actually started this last test and it seem MANY .RAR files are broken. (Current result: 27 good, 11 bad, for a total of 425 files, I expect that these some files that were inside those folders that were initially "corrupted"... this is gigantic as I would usually expect somethings between 0 and 2, but more often 0, bad files on a whole drive). Despite all the files seeming OK according to WinMerge (fast compare) and drive being good according to Western Digital Dashboard.
So I think I'm going to re-test the other drive (the "twin" one), reformat tested drive and re-copy everything.
My questions
- What do you think happened? (I would guess something messed up with the drive's file table, but I don't know much about file tables.)
- What partly repaired it? (Maybe I shouldn't have interrupted chkdsk /r ? )
- Should I worry about the drive, should I replace it? Should I re-format it? Or should I consider that this was an isolated software/system incident (since all test from Wester Digital Dashboard were OK.)?
Last edited: