You are simply not listening (...)
Most of your post is out of topic and assuming that I don't already know everything you say and that I don't already do most of it.
I just cut on having only 2 copies of certain data instead of 3 because they are less important and the whole thing became expensive enough already and I'm aware that this increases risk to some extent. I agree that having 3 copies is ideal.
This seems to have come performance optimization,
Correct, it's a question of optimisation for a choice that I have to make now, a choice that probably does not have huge consequence but that I need to at this point... Which is to choose Allocation Unit Size at the time I'm moving two 10 TB archive volumes to one 26 TB archive volume (two copies in this case).
a way to more quickly recover the data than pulling it from backups,
Not exactly. What I'm hoping for is to select Allocation Unit Size that can make the (bi-annually/yearly) RAR testing of the entire drive slightly faster. Because previously it could already take more than 24 hours for a drive to get tested, and getting a bigger drive will take even long. (On the other hand, I upgraded one of my computers recently, but I rather run these long test sessions one of my secondary computers.)
by knowing whether the cluster size can help or hinder using the RAR recovery record if this particular form of corruption occurs.
Yes.
I believe OP realizes that if bad sectors appear or at least if they are increasing over time, moving the data off of that drive should become a priority,
Yes, and that's the reason why I'm merging these archive volumes, because one of the drives started showing corruption.
(And I'll resell the remaining OK-drives or use them for something else. I run some secure erase process if I re-sell them.)
On the other hand, file/RAR level corruption happen much more often and I simply live with those otherwise I would have to replace some drive much too often.
(I think that they can simply be cause by things like the system crashing.)
but if it takes 15 times as long to pull it from a backup as it does to transfer it from drive to drive, being able to repair and verify the data on the drive becomes a reasonable desire.
Usually the time of recovering a file is long only because one of the copies it stored in another place most of the time.
But also, most times, RAR is simply able to repair the archive so I don't even need on of the extra copies right away.
I assume the only reason for keeping all this on an "archive drive" is quick access versus getting the files from your other backup methods, and maybe the cost of those other methods.
Mostly to keep control over my stuff. Then "quick access", yes I always have one of the copies at home that I can just plug in one of my computers when I need to access some archive.
I don't remember if it was on this forum (maybe it was on a subreddit), but we had concluded relatively recently that the cost of paying for good hard drives and the cost of paying for good cloud were often similar in money/TB, with some variations from time to time. Except the cloud comes with more constraints... upload time, access time... (I'm not even sure if my ISP would let me upload 50+ TB)
I don't want to rely on "backup solutions" for various reason: archive, backups, and also system resilience are different things, I don't any black-boxy proprietary format or a software deciding what to do my files (note that RAR is proprietary and not open source but UNRAR is open source... and I think it's got a long enough history and compatibility and if I should ever change format, automating mass-conversions would be rather easy).
Honestly, if you have full copies of the files, and could recover one or two that got corrupted, just generating MD5 hashes of each of them and comparing them every few months seems like it would be way easier and reliable,
I think that's what RAR testing does, it probably compares hashes, or something similar. I never tried to investigate if there was a faster method, that would be interesting, but so far I assumed RAR knows what it does with its own format files.
(...) But if your RAR test identifies only one or two bad files it seems like it would be just as simple to pull them from the other backup than to try to repair anyway.
It depends where the other copies are. Actually, what I often do is that I first repair the file, and at some synchronisation moment, I recover the file from the other copy so they keep the same create/mod. date.
Everyone is simply trying to give advice that OP doesn't desire.
Yeah, but that's something I often observed with precise technical topics.... I come asking a very precise question. I try to give just enough context that I think is necessary to pin point the question. But maybe I overexplain? Then some people will disagree with whatever element from the context (which is OK, the issue is when it derails conversation), or enumerate all their advice/knowledge as if they were teaching their Grandma and as if I didn't already know. Of course it's sometime an occasion to actually learn something of refine some knowledge, but the issue again is when it derails the conversation.
(I've tried to keep most of my responses focused on the cluster reading issue.)
Indeed, thanks.
So back to the topic...
I currently think that larger Allocation Unit Size may increase severity of potential future data corruption. I need to re-check the "study" I posted in an earlier reply to try to understand if this this increment is too significant or not. (But I won't check that before tomorrow, it's too late here right now.)