[SOLVED] Large archive disks, cluster size and damage risk?

MaxT2

Commendable
Apr 14, 2021
82
6
1,545
Hello,

Introduction/edit: I'm not sure why it looks like I wrote this in a way the is misleading people to believe that they need to give me advice about some recurring corruption problem. This is not what my question is about. The question of the topic, as in bold further in the text is:
Considering the context described in this post: Does using larger cluster size increase the risk of having larger parts of files getting corrupted?
(And also, please read the whole thread before replying...)


I have very large archive drives with terabytes of data. Everything is in .rar files.

Fonr information and context, they are archives so my priorities are:
  1. Longevity/integrity/preservation (critical)
  2. Testability (critical)
  3. Cost (money)
  4. Performance (just nice to have, but tasks may take very long so not to be completely ignored)
I regularly (I try to do it every year) batch test all the files on the drives to make sure they are still they are not corrupted. These batch tests take from 6 hours to 36 hours or so depending on the drive.
I think that I observed a performance gain on drives using larger cluster sizes (most of my drive have the default 4K cluster size, but some use 2M cluster size).
Actually I think that the performance gain was when copying files rather than testing them.
Sometimes, but very rarely, some file appears corrupted even though disk appears OK. (And this is not the purpose of the topic, I get very few corrupted files so I'm not really concern about it.)

Currently, I actually assume that since I mostly have very large files, I can benefit from a large cluster size, but there may be a lot of things I don't really know in what happens behind the scene.

But here is something I'm wondering:
When using a larger cluster files, is there a risk that more bytes get corrupted at the same time?
Because WinRar has "recovery record" (equivalent to parity files/data) which help repair a file when it's damaged, but it can only repair file within a certain amount of corrupted data, depending of the size allowed to recovery record; if too much data is corrupted, it won't work.
So when using large cluster size, do I increase the risk of having larger parts of files getting corrupted? (In cases of "bad cluster" or so.)

Thank you
 
Last edited:
Solution
It may help to identify the type of corruption you are experiencing. How can you be certain that the problem isn't due to flaky system RAM? If you verified the integrity of the backup at the time it was done, then any subsequent corruption of the HDD would be detected at the physical sector level, except for very rare cases. If you reread the same file, is it still corrupt? What do you see when you compare the good and bad copies in a hex editor (eg HxD, freeware)?

HDD specs usually quote an error rate of 1 in 10^14 bits read for desktop drives and 1 in 10^15 for enterprise drives.

Personally, I don't like to use compression for backups. It's too easy for a relatively minor error (eg 1 bad sector) to trash the entire archive. Maybe...
Do you verify your backup right after creating it? It doesn't sound right that you're seeing corruption 'every now and then'.
Are these drives always powered up and doing other things or are they being disconnected after backup and put in storage? Please list the drives model/mfg/enclosure in question.
 

MaxT2

Commendable
Apr 14, 2021
82
6
1,545
Well my question was regarding cluster size: does larger cluster size increase the risk of larger quantities of bytes corrupted at the same time.

---

I'm not worried about corruption issues at this point because when I wrote "now and then", I mean "very rarely".

So this is a bit off topic but just to reply:
I don't use real enclosures, I use bays such as these : https://images-na.ssl-images-amazon.com/images/I/81ZiQwn+BUL._AC_SL1500_.jpg
When I don't need the archive disks I usually remove them and place them in cases like these : (can't copy image, just search fo "Orico hard disk case".
(It may happen that I forget them or that I postpone removal since it requires shutting down).

In the past I had many files that came from optical supports and that I transferred to Samsung Spinpoint F1 drives. Plus I wasn't has careful back then. Back then I found a some corrupted files, but I've been able to recover most of them thanks to WinRar recovery records.
Now I have moved everything to larger Western Digital Gold disks, except one volume that is on a Western Digital Red (2 copies).


I really haven't found many corrupted data since I moved to the Western Digital disks, I can remember it happening between for 1 to 3 files (for approx. 50 TB of data in total, all copied at least twice, the most important volume has 3 copies). Yesterday I found 1 corrupted file one of the copy of the data that are on WD Red.

"Do you verify your backup right after creating it?" -> And yes I do, but it's not at that moment that corruptions happened.
 
Last edited:
Got it. Okay.
I think, technically, you have an increased chance of more data getting corrupted the larger your cluster size is - but the difference won't be much (if any) in real world examples of data corruption because corruption happens so infrequently.

When you say, "I can benefit from a large cluster size," what benefits are you referring to? If you are going to say speed, have you actually done speed tests?
 

MaxT2

Commendable
Apr 14, 2021
82
6
1,545
"what benefits" -> Yes, speed/performance.
I did not do a complete study, but I compared once with similar disks, on with default cluster size, the of having 2M clusters.
And yes I observed some difference.

But I did it long enough ago that I forgot the details about it. As mentioned in initial post, I don't even remember if I noticed benefit when copying files or during the file integrity testing process, but since then I assumed that larger clusters would be beneficial, that I would format new disks with larger clusters, but that it was not necessarily worth it to convert existing disks to a larger cluster size.
 

MaxT2

Commendable
Apr 14, 2021
82
6
1,545
@kanewolf : Thank you for your reply. As I explained above, I don't really have a concern about current performance or corruption, the actual question after describing the context of for the topic was: "using large cluster size, do I increase the risk of having larger parts of files getting corrupted"

And I use SATA bays like these. I just realised that some have "6.0Gb/s" written on them, or ther have "3.0Gb/s", yet I'm not sure that which SATA controller each of them is connecter since I have 3 to 4 in each computer (I think I wrote it in some texte file last time). So it's probably a mix of SATA 2 and SATA 3, and this may be indeed a point that I could try to improve when I get time... But the most important is that the files get on the disks, are testable, and only very casually corrupted.
81ZiQwn%2BBUL._AC_SL1500_.jpg
 
Last edited:
It may help to identify the type of corruption you are experiencing. How can you be certain that the problem isn't due to flaky system RAM? If you verified the integrity of the backup at the time it was done, then any subsequent corruption of the HDD would be detected at the physical sector level, except for very rare cases. If you reread the same file, is it still corrupt? What do you see when you compare the good and bad copies in a hex editor (eg HxD, freeware)?

HDD specs usually quote an error rate of 1 in 10^14 bits read for desktop drives and 1 in 10^15 for enterprise drives.

Personally, I don't like to use compression for backups. It's too easy for a relatively minor error (eg 1 bad sector) to trash the entire archive. Maybe WinRAR is less affected by such problems than other archivers, but it still doesn't fill me with confidence.

Have you considered a fault tolerant file system?

This thread has some ideas:
https://superuser.com/questions/490527/is-there-a-file-system-that-does-error-correction
 
Last edited:
Solution

Pc6777

Honorable
Dec 18, 2014
1,125
21
11,465
corruption matter more or less depending on the type of data your storing, if its a bunch of photos and there is a small corruption, it could mess up a few of the photos and everything else will be fine, if its something like a complex piece of software and 1 flipped bit for missing string could make whatever it is unusable or so unstable you cant use it.
 

MaxT2

Commendable
Apr 14, 2021
82
6
1,545
Thank you for replies.

@fzabkar :
Corruption sometimes happens at logical level with impact on physical level (don't ask me why but it does). I do test both, but I test logical level more often, assuming that if data are good (which is the most important aspect) underlying physical locations must be good too.

I think that WinRar answers to both concerns your are mentioning as it is fault tolerant in the sense that is can repair archives to some extent based on its "Recovery Record" (~equivalent to parity files), the amount of repairable damage depends on the size allowed to the "Recovery Record". I think that it is also able to extract files that are not corrupted from a partly corrupted archive (but I don't remember if this depends on parameters or not).

It's interesting, I've never considered using specific file systems, but so far I prefer to stick to file systems that are widely used. I'm not quite sure how they achieve fault tolerance, but I would expect this to come with a cost similar to allowing more space to WinRar's "Recovery Record".
I would also expect that fault tolerance from a file system is quite silent to the user, I like to have my software telling that an archive is corrupted and that I should re-fetch it from the other copy of the same archive. Also, disk being fault tolerant won't protect data during transfers, while transferring archives containing their own parity data will protect them anywhere.
 

TRENDING THREADS