Question Could Allocation Unit Size impact data integrity on an archive HDD ?

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Basically, yes.

It is considered corrupted.

The OS will no longer read that sector. And the drive firmware will (hopefully) remap that sector to one of its unused ones.

Expecting the OS to read a partial sector and ignore the bad "bytes" goes against the whole concept of the sector size concept.

Thanks. No, I didn't not expect the OS to actually skip bytes.... rather to read or not read the whole thing... I think after that, it's rather the role of RAR integrity/hash functions to know if received data make sense in that place.

And so, the OS would recover what can be recovered from the bad sector and write elsewhere (maybe I missed that point early) so in the case, larger allocation unit size would NOT increase the size of the corruption pas the actually corrupted bytes.
 
From https://datarecovery.com/rd/what-are-bad-sectors/
“……As the platters spin, a set of actuator headsfloat over them on a cushion of air. The actuator heads read and write the magnetic charges on the sectors, then send that information to your computer.

Over time, however, sectors can become unreliable. A sector becomes a “bad sector” when the computer identifies it as permanently damaged. The sector is unusable, and any data within the sector is lost. If a hard drive has a large number of bad sectors, some files may become corrupt or unusable.
 
  • Like
Reactions: MaxT2
That part of text also seems to implied that, even if it's partly corrupted, the OS would actually read the full sector... If I get it right...
Getting sectors and clusters and blocks mixed up still I think. Sector is physical (and often called blocks as well) - generally on modern drives they are 4KB in size but are emulated as 512-bytes. (This results in some additional reads and writes because the drive has to read and then write a full physical sector to change a single 512 byte logical sector; however due to the average size of files these days it ends up not being noticeable most of the time.)

Clusters are logical chunks determined by the OS when formatting, and are composed of one or more physical/logical sectors. Full sectors are always read because that's the smallest unit to read or write. What I think you're reading here is saying that the OS always reads a full CLUSTER, but it turns out NTFS can read and write partial clusters as well, because it knows which LBA addresses are assigned to each cluster. So if cluster B of a file is 2MB in size and is at LBA 4096 to 8191, but the OS only needs a 50KB chunk, it can instruct the drive to only pull the data from LBA 4160 to 4259. (Internally the drive has to translate that to the 4K physical sectors.) NTFS can also just write a single 50KB chunk rather than having to write the full 2MB. (Mechanical drives and SSDs handle this differently internally, but the OS doesn't see it.)

The OS keeps track of the clusters and logical sector addresses, while the drive keeps track of the physical sectors and logical sectors. The drive is responsible for translating LBA addresses to physical addresses (a modernized holdover created to allow drives to go over 8GB in size), and now also for tracking the translation from 512-byte logical sectors to 4K physical sectors.

At any rate, this doesn't change the issue you're asking the question about. Yes, Windows can just read part of a cluster, but the underlying functional part of the OS does NOT do that unless it's told to by the application. The application has to request part of a file from the filesystem driver, and the OS figures out where that chunk is stored and retrieves it. If the drive runs into an unreadable sector, the OS doesn't have a provision during normal operation for what to do about that and reports a failure to read the file. It won't even return the data from the sectors that are good unless the application specifically requested that, but the application doesn't know anything about sectors or even clusters, so it can't tell Windows to just work around the bad sector.
 
And so, the OS would recover what can be recovered from the bad sector and write elsewhere (maybe I missed that point early) so in the case, larger allocation unit size would NOT increase the size of the corruption pas the actually corrupted bytes.
The OS doesn't move any data unless you're actively running a disk check. The drive itself during normal operation will detect that a sector is approaching the threshold of unreadability set by the manufacturer and will decide to reallocate the sector. But that only happens on sectors where the drive is actively reading and writing data. If you run a full disk check, then the OS/utility will check every sector even if it's unused. The manufacturer's tool performing a full SMART scan probably does a better and faster job as it's actually just telling the controller in the drive to do the work, rather than involving the CPU.

Another reason to use the SMART utility to do a full scan is that something like chkdsk does not, as far as I can tell, reallocate bad sectors. It just marks them as bad and tries to move the data to a good sector, but in the end, you have reduced storage capacity because chkdsk isn't replacing the bad sector with one from the spare pool. The SMART data would not show an increase in the reallocated sector count. Chkdsk also does not PERMANENTLY store bad sector data. It just stores it in the $BadClus system file, so if you wipe the partition then all the sectors marked bad are lost. (This is all done at a filesystem level. Not device level.) If chkdsk is finding bad sectors that the firmware doesn't, the drive will probably fail anyway.

Chkdsk of course does still have its uses, as filesystem-level corruption can occur where there is no hardware failure.
 
But that's absolutely not the right approach to it. I'm trying to understand if I can lower the risk to get corrupted data... while hoping to keep performance optimise somehow, (because but a full integrity test (not talking about chkdsk or such but about unrar.exe t ....) of a whole 10 to 14 disk takes 24+ hours already) .

With a goal that would ideally be preservation of 100% files and I think I achieve preservation of 99.9999999% of my archives over more than 20 years.

What you are saying it when enough files are damage, one would replace the disk. In this topic, you should think about the damages to the files first, the disk itself is the element that doesn't matter, because it's just the tool and it is replaceable indeed, (not too often hopefully).



I will have nightmares now.
You are simply not listening, or not hearing, the very good advice you are getting from several directions.

One, if ANY part of a drive is bad, you SHOULD assume the entire drive is bad, at least in terms of whether that drive is any longer trustworthy, because it is not. Is it still "Usable". Well, yes, technically. Is it trustworthy. No. Absolutely not.

Two, which feeds into "One", is the fact that you've decided that the idea of a backup as the ACTUAL way to avoid data loss, is not acceptable. All this says is that you've given up on 50 years of logical data preservation techniques and decided that you know better than they do. Or we do. Or anybody does. Because nobody in their right mind would ever deny the fact that having EVERY SINGLE important thing you can't afford to lose, be in AT LEAST three places at any given time, is the ONLY way to 100% ensure you will not have a loss of data. You can do whatever you want to any drive and use any methodology you want to use, and you still cannot make any drive impervious to data loss. Even instantaneous data loss in some cases. Drive could be ten years old or ten minutes old and still fail entirely and in the same way.

The bottom line is, the minute you know something, ANYTHING, is wrong with a drive, whether it be a head problem or a surface issue, or losing sectors, or whatever, you should immediately begin moving forward with plans to relegate that drive to either warranty or replacement, and anybody who had even moderate concerns that the information on that drive would be cry worthy, should have already had that information on at least two other sources whether that was another internal drive, and external drive, an NAS or a cloud backup, so that it would not matter.

Any answer you get other than that, is simply somebody trying to pat you on the back and give you the answer you are looking for. They are not actually trying to help you avoid a catastrophe.
 
You are simply not listening, or not hearing, the very good advice you are getting from several directions.
I don't think that's the case. Everyone is simply trying to give advice that OP doesn't desire. (I've tried to keep most of my responses focused on the cluster reading issue.) He's said he already has a backup strategy. This seems to have come performance optimization, a way to more quickly recover the data than pulling it from backups, by knowing whether the cluster size can help or hinder using the RAR recovery record if this particular form of corruption occurs. I believe OP realizes that if bad sectors appear or at least if they are increasing over time, moving the data off of that drive should become a priority, but if it takes 15 times as long to pull it from a backup as it does to transfer it from drive to drive, being able to repair and verify the data on the drive becomes a reasonable desire.

I assume the only reason for keeping all this on an "archive drive" is quick access versus getting the files from your other backup methods, and maybe the cost of those other methods.

Honestly, if you have full copies of the files, and could recover one or two that got corrupted, just generating MD5 hashes of each of them and comparing them every few months seems like it would be way easier and reliable, if they're not so huge as to make pulling them from the other backup location take forever. But if your RAR test identifies only one or two bad files it seems like it would be just as simple to pull them from the other backup than to try to repair anyway.
 
Last edited:
You are simply not listening (...)

Most of your post is out of topic and assuming that I don't already know everything you say and that I don't already do most of it.
I just cut on having only 2 copies of certain data instead of 3 because they are less important and the whole thing became expensive enough already and I'm aware that this increases risk to some extent. I agree that having 3 copies is ideal.


This seems to have come performance optimization,

Correct, it's a question of optimisation for a choice that I have to make now, a choice that probably does not have huge consequence but that I need to at this point... Which is to choose Allocation Unit Size at the time I'm moving two 10 TB archive volumes to one 26 TB archive volume (two copies in this case).

a way to more quickly recover the data than pulling it from backups,
Not exactly. What I'm hoping for is to select Allocation Unit Size that can make the (bi-annually/yearly) RAR testing of the entire drive slightly faster. Because previously it could already take more than 24 hours for a drive to get tested, and getting a bigger drive will take even long. (On the other hand, I upgraded one of my computers recently, but I rather run these long test sessions one of my secondary computers.)

by knowing whether the cluster size can help or hinder using the RAR recovery record if this particular form of corruption occurs.

Yes.

I believe OP realizes that if bad sectors appear or at least if they are increasing over time, moving the data off of that drive should become a priority,

Yes, and that's the reason why I'm merging these archive volumes, because one of the drives started showing corruption.
(And I'll resell the remaining OK-drives or use them for something else. I run some secure erase process if I re-sell them.)

On the other hand, file/RAR level corruption happen much more often and I simply live with those otherwise I would have to replace some drive much too often.
(I think that they can simply be cause by things like the system crashing.)


but if it takes 15 times as long to pull it from a backup as it does to transfer it from drive to drive, being able to repair and verify the data on the drive becomes a reasonable desire.

Usually the time of recovering a file is long only because one of the copies it stored in another place most of the time.
But also, most times, RAR is simply able to repair the archive so I don't even need on of the extra copies right away.

I assume the only reason for keeping all this on an "archive drive" is quick access versus getting the files from your other backup methods, and maybe the cost of those other methods.

Mostly to keep control over my stuff. Then "quick access", yes I always have one of the copies at home that I can just plug in one of my computers when I need to access some archive.
I don't remember if it was on this forum (maybe it was on a subreddit), but we had concluded relatively recently that the cost of paying for good hard drives and the cost of paying for good cloud were often similar in money/TB, with some variations from time to time. Except the cloud comes with more constraints... upload time, access time... (I'm not even sure if my ISP would let me upload 50+ TB)

I don't want to rely on "backup solutions" for various reason: archive, backups, and also system resilience are different things, I don't any black-boxy proprietary format or a software deciding what to do my files (note that RAR is proprietary and not open source but UNRAR is open source... and I think it's got a long enough history and compatibility and if I should ever change format, automating mass-conversions would be rather easy).


Honestly, if you have full copies of the files, and could recover one or two that got corrupted, just generating MD5 hashes of each of them and comparing them every few months seems like it would be way easier and reliable,
I think that's what RAR testing does, it probably compares hashes, or something similar. I never tried to investigate if there was a faster method, that would be interesting, but so far I assumed RAR knows what it does with its own format files.

(...) But if your RAR test identifies only one or two bad files it seems like it would be just as simple to pull them from the other backup than to try to repair anyway.

It depends where the other copies are. Actually, what I often do is that I first repair the file, and at some synchronisation moment, I recover the file from the other copy so they keep the same create/mod. date.

Everyone is simply trying to give advice that OP doesn't desire.

Yeah, but that's something I often observed with precise technical topics.... I come asking a very precise question. I try to give just enough context that I think is necessary to pin point the question. But maybe I overexplain? Then some people will disagree with whatever element from the context (which is OK, the issue is when it derails conversation), or enumerate all their advice/knowledge as if they were teaching their Grandma and as if I didn't already know. Of course it's sometime an occasion to actually learn something of refine some knowledge, but the issue again is when it derails the conversation.

(I've tried to keep most of my responses focused on the cluster reading issue.)

Indeed, thanks.



So back to the topic...
I currently think that larger Allocation Unit Size may increase severity of potential future data corruption. I need to re-check the "study" I posted in an earlier reply to try to understand if this this increment is too significant or not. (But I won't check that before tomorrow, it's too late here right now.)
 
Last edited: