Question Hypothesis check - Better not defrag archive drive?

MaxT2

Reputable
Apr 14, 2021
144
7
4,595
I would like to check some hypothesis:
Context:
- Magnetic hard drives
- Archive formats that include any kind of recovery data (could be RAR's recovery record, could be PAR files...)
- File corruptions happen from time to time.
- Talking about an archive drives so, drives kept outside of systems most of the time and that are rarely accessed except for fetch some specific data/files from time to time (but data may sometimes be reorganised, I mean it's not written all at once at the same time).
- Not taking into account that it is recommended to sometimes just re-write data to avoid them to "fade".

My hypothesis is that it would be better not to defrag the drive, the reason would be that if some consecutive part of the drive gets damaged, since the each file has chances to get spread a bit everywhere on the drive, this reduce the chances of a given file have a larger damaged part than what it's recovery data can restore. So not defragmenting results in statistically better chances that the recovery data will actually allow repairing.

It's likely nit-picking ... but does it make sense or is it completely off?
 
If read performance isn't the primary concern, yes I would tend to agree. CRC data is more likely to be successful at recovery of lightly corrupted files if the bits are spread around.

Assuming the drive would fail in a particular spot anyway.

For an archive, ideally you just have another copy so that you aren't reliant on a single point of failure.
 
  • Like
Reactions: MaxT2
@Eximo :
Thank you
- Performance is not a concern on the drive (indeed)
- I'm not sure at what layer it happens, but files do get corrupted from time to time, even when drives are stored out of system and drive surface is still tested OK (that's why I mostly test the "upper layer" ... the archive files ... more often than doing a chkdsk.).
- I talked about one single drive to simplify the theoretical explanation. It is indeed recommended to have 3 copies of archives. Depending on my budget and the importance of each drives, I have 2 or 3 copies of each,. So when testing them and detecting a corrupted file, I restore it from recovery data or from another copy of the drive.
 
If the hard disk doesn't contain a bootable operating system or large database files, I wouldn't recommend defragging.

I used to defrag hard disks running Windows XP using Raxco software, which moved the boot files up to the start of the drive, where read/write speeds were fastest. This improved Windows XP startup speeds slightly.

On data drives, I left them alone.
 
  • Like
Reactions: MaxT2
Spreading the data out to reduce the chances of a file's data being corrupted progressively doesn't really matter. If it's anything but a regular plain text file, it's going to need, more or less, all of the data to be viewed properly. This is especially the case if the data was transformed in some manner via a file compression or encryption method.

While there may be some error correction for that data, it's only meant to handle a few bit-flips and not swathes of data.
 
@hotaru.hino Thank for your reply.
If the text file is in an archive format that may get repaired, the whole point it that I hope I'll get all of it's data at the end of the repair process.

So far I've been able to repair most data corruption that ever happened, and if not I usually still had the file on another copy.

The quantity of data that can be repaired depends on what size you allow to the recorvery data. In example, WinRar allows 3% by defaut (<-- if I remember well), I usually use 3% and sometimes 5% (I may have gone up to 10% in some rare cases, if I remember well, which is likely overkill).
 
the reason would be that if some consecutive part of the drive gets damaged, since the each file has chances to get spread a bit everywhere on the drive, this reduce the chances of a given file have a larger damaged part than what it's recovery data can restore.
Or you increase the chances of many files getting unrecoverable damage instead of only one or two files...
This is a question that only has an answer in hindside when you see the damage on the disk, before any damage happens it's just a crapshoot, you could lose more or less data either way.

There is a reason that people keep multiple backups.
 
  • Like
Reactions: MaxT2
I will say I'm not a fan of putting my data through a file compression/archival tool for backup or long term storage. The problem is that it adds another layer of yet another program needed to view the data, yet another transformation of that data which could be problematic with a stray bit-flip, among other things. While sure, popular formats like zip, rar, and 7z (among others) are likely to not go away any time soon, it's still a concern.

So instead my primary staging point for backups, my NAS, uses btrfs which is designed for data reliability.
 
@hotaru.hino
Interesting, I never heard of BTRFS.
But if it is a form of RAID (some kind of improved RAID if I understand right?), then it is a "resilience" system, and NOT a backup or archive solution... (which the first Google definition makes me think)
Can you remove the drives or do they have to stay in the NAS all the time?

My approach on this is I prefer to rely on
  • SATA
  • some standard formats such are .rar/.7z/.zip (all of which have been there since the 90s and have open source extractors ... yes, unar.exe is open source)
    • In case I would need to convert all files, it would not be too difficult to find a use PC and script/code the conversion.
    • Also, it is much easier to test the files if they are all stored in a few testable formats (because, in Windows/NTFS, I know that I can do a full chkdsk not finding any error and yet have a corrupted file, while if an archive file passes test, I'm 100% sure that what's inside isn't corrupt (compared to when it was archived). And if not in archive (so say .txt, .jpg, .html, .mp3, .mp4 etc.) in most cases, the only way to test them would be to open them and check everything inside (in a text file in example). (I can also script/code the testing.)
So, I prefer to rely on that rather than on any specific hardware (and depending on other hardware?)

But, we're getting completely out of topic here.
 
Last edited:
@hotaru.hino
Interesting, I never heard of BTRFS.
But if it is a form of RAID (some kind of improved RAID if I understand right?), then it is a "resilience" system, and NOT a backup or archive solution... (which the first Google definition makes me think)
Can you remove the drives or do they have to stay in the NAS all the time?

My approach on this is I prefer to rely on
  • SATA
  • some standard formats such are .rar/.7z/.zip (all of which have been there since the 90s and have open source extractors ... yes, unar.exe is open source)
    • In case I would need to convert all files, it would not be too difficult to find a use PC and script/code the conversion.
    • Also, it is much easier to test the files if they are all stored in a few testable formats (because, in Windows/NTFS, I know that I can do a full chkdsk not finding any error and yet have a corrupted file, and in most cases, the only way to test them would be to open them and check everything inside (in a text file in example). (I can also script/code the testing.)
So, I prefer to rely on that rather than on any specific hardware (and depending on other hardware?)

But, we're getting completely out of topic here.
btrfs is a file system, it can be used in RAID or standalone drives. It's basically Linux's competitor to ZFS. Could I remove the drives? Probably, but that'd take more effort at the moment than I'm willing to do because you asking that implies that I can read it in another system and I don't have something prepared that could do that.

What you're thinking doesn't matter in the end. If the drive survives its initial 30 days of the bathtub curve, it's likely going to work fine for years afterwards. And if you're storing it somewhere, the only thing that would be a problem is a bit flip from a random radioactive event such as a gamma or cosmic ray hitting the platter. Assuming a truly even distribution of probability of where said event would happen, it doesn't matter if the data is spread out or all in one place.

Although where the data physically lives on the drive may be a factor, since HDDs use CAV, so the data further out on the platter tends to be more spread out (I'm aware that HDDs now use zoned CAV to increase the data density on the outside of the platter).

But at the end of the day, you have no control over how the data gets written to the drive and most file system stacks want to put data closer to the center of the platter and together for performance reasons. There may be something you can do to edit this, but I'll leave that as an exercise for the reader.
 
  • Like
Reactions: MaxT2
the only thing that would be a problem is a bit flip from a random radioactive event such as a gamma or cosmic ray

Yeah, years ago I had a physicist programming teacher who was making jokes about cosmic rays all the time.
But in pratice, as I mentionned earlier, even on untouched drives, some files do get corrupted from time to time. I suppoed that's the reason why it recommended to sometimes just re-write the data so they don't "fade".

you have no control over how the data gets written to the drive and most file system stacks want to put data closer to the center

True, I initially assumed it was completely random, but the systems may go against it to some extent. Though what still introduces randomness in writing is me reorganising the files. (I don't mean just moving files but rather like extracting and then merge a give archive content.) I'm not trying to "control randomness" at that level.
 
Or you increase the chances of many files getting unrecoverable damage instead of only one or two files...
This is a question that only has an answer in hindside when you see the damage on the disk, before any damage happens it's just a crapshoot, you could lose more or less data either way.

There is a reason that people keep multiple backups.
Another thing to consider is that if the file system has been damaged, or a file inadvertently deleted, that data may still be intact on the drive, but the map pointing to it is no longer available. File recover software can potentially be used to recover that data, but if the file is broken up across the drive, it's less likely to be recoverable. Software like PhotoRec, for example, can scan through a drive without a file system cluster-by-cluster searching for file headers to locate files that it can then save to another drive. Fragmented files are likely to break the recovery though, since the software has no way of knowing whether a chunk of a file without a header should be attached to another file somewhere. I believe PhotoRec can attempt to recover some slightly fragmented files, by continuing where it left off after locating an intact file within another one, but most fragmented files are likely to be nonrecoverable by such a method. Or in the case of something like a video or audio file, only a portion of the file will be recoverable, up until the point where the fragmentation occurred.
 
  • Like
Reactions: MaxT2
@cryoburner : Interesting point I had not thought about. Though, on my archive drives, I rarely delete adn whenever I delete something, I always "soft-delete" it to the Recycle Bin (or even sometimes to a temporary "To be deleted" folder, and then to Recycle Bin.) I hadn't really thought about it, I do that kind of things "intuitively". (Though, sometimes some files are too large for the bin.)
But I can see many situations where this can apply and may be kept in mind regarding drives in general.
 
I personally assume that any file larger than 4096 bytes and therefore occupying more than one block on a disk could well be fragmented. The operating system tends to scatter file fragments around wherever it thinks fit

The various utilities I use to recover deleted files, e.g. Piriform Recuva, do a pretty good job, provided no blocks have been overwritten. It doesn't matter how fragmented they are, provided all the blocks are there.

Recuva groups deleted files into three types:-

1). Green. No blocks overwritten. Good chance of recovery.
2). Amber. Some blocks overwritten. Poor chance of recovery.
3). Red. All blocks over written. No chance of recovery.

On hard disks, I'm much happier if the files are written at the outer edge of the platters. It makes data retrieval two times faster than files written on the innermost sectors. Any hard disk surface test utility will show transfer rates in the middle are half those at the edge, because there are only half as many sectors/blocks per track near the spindle.

Transfer rates are crucial when you're trying to back up from hard disk to LTO (Linear Tape Open) as I am at this moment in time. I need to maintain a constant write speed of 80MB/s to LTO4, to prevent the tape drive from "shoe shining". This means I need a hard disk to be capable of at least 160MB/s at the outer edge. It also helps if you only backup large files over 15MB each to tape. Anything less and it's shoe shining time.
 
  • Like
Reactions: MaxT2
You DO have a good backup routine, correct?
Yes but:
- I want the thread to stay focus on that precise question: "Is letting an archive disk getting fragmented theoretically better (to whatever extent) in case data repair is needed, rather than defragmenting it."
- I'm talking about archives, cold-stored... so, rule is "have 3 copies of each archive volume" (I sometimes only have 2 due to the budget I want to allow to those) ... but in my understanding, archives are not supposed to get extra backups; and backups are for what is currently in the "live" system, or whatever ongoing project.
 
Yes but:
- I want the thread to stay focus on that precise question "is letting the disk getting fragmented theoretically better (to whatever extent) in case data repair is need.
- I'm talking about archive, cold stored... so, rule is have 3 copies of each archive volume (I sometimes only have 2 due to budget) ... but to my understanding, archives are not supposed to get extra backups and backups are for what is currently in the "live" system.
3-2-1 is for any data you do not wish to lose.
A single copy of something in "cold storage" may be considered to not exist at all.

A lot of the discussion was centered around potential data loss or corruption while defragging. With a good backup routine, that is a non-issue.
 
OK. To return to topic. I no longer defrag any of my hard disks, now that I boot from SSD. In olden days, Win 95, 98, NT4 and XP, I used to defrag the OS drive every 3 months, to speed up boot times after Windows Updates scattered the OS files all over the disk.

As drives start to fail, especially over groups of contiguous sectors, I feel it's better to avoid copying data from good sectors to potentially bad sectors during a defrag.
 
Also another thing to add is defragging has been something that hasn't really been necessary for hard drives unless it's reached severe levels, like 70%+, of fragmentation. The reason being is SATA introduced a method to allow the head to swing around in a manner such that it requires fewer rotations of the platter to get all of the data (this feature is known as NCQ).

So rather than try to go to locations A, B, C, an D, which may be all over the place, it may pick up data in B, D, A, C because the head can simply move to the track where that data is the moment it picked up the other one.
 
  • Like
Reactions: MaxT2
As drives start to fail, especially over groups of contiguous sectors, I feel it's better to avoid copying data from good sectors to potentially bad sectors during a defrag.
This reminds me that some time ago I used "O&O Defrag" (which I now not recommend). I used it because I liked the presentation, but I didn't let it auto-defrag everything as a "background service" ... that was until some update where de-activating auto-defrag was made impossible. It still kept it install for some times because I was "used to have it" ... but after I uninstalled it I think notice performance improvement (in the sense that the PC sounded/felt less busy all the time" ... And I'm quite convinced that a good amount of any data corruption I had during the years I was using it were caused by this software...
 
I personally assume that any file larger than 4096 bytes and therefore occupying more than one block on a disk could well be fragmented. The operating system tends to scatter file fragments around wherever it thinks fit

The various utilities I use to recover deleted files, e.g. Piriform Recuva, do a pretty good job, provided no blocks have been overwritten. It doesn't matter how fragmented they are, provided all the blocks are there.
I believe Recuva mainly references the locations of files via the filesystem though. If the filesystem itself is damaged, then the software needs to rely on its "deep scan" feature, which should similarly have difficulty recovering fragmented files, since it lacks a record of where those various fragments are located.