Question Could Allocation Unit Size impact data integrity on an archive HDD ?

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Basically, yes.

It is considered corrupted.

The OS will no longer read that sector. And the drive firmware will (hopefully) remap that sector to one of its unused ones.

Expecting the OS to read a partial sector and ignore the bad "bytes" goes against the whole concept of the sector size concept.

Thanks. No, I didn't not expect the OS to actually skip bytes.... rather to read or not read the whole thing... I think after that, it's rather the role of RAR integrity/hash functions to know if received data make sense in that place.

And so, the OS would recover what can be recovered from the bad sector and write elsewhere (maybe I missed that point early) so in the case, larger allocation unit size would NOT increase the size of the corruption pas the actually corrupted bytes.
 
From https://datarecovery.com/rd/what-are-bad-sectors/
“……As the platters spin, a set of actuator headsfloat over them on a cushion of air. The actuator heads read and write the magnetic charges on the sectors, then send that information to your computer.

Over time, however, sectors can become unreliable. A sector becomes a “bad sector” when the computer identifies it as permanently damaged. The sector is unusable, and any data within the sector is lost. If a hard drive has a large number of bad sectors, some files may become corrupt or unusable.
 
  • Like
Reactions: MaxT2
That part of text also seems to implied that, even if it's partly corrupted, the OS would actually read the full sector... If I get it right...
Getting sectors and clusters and blocks mixed up still I think. Sector is physical (and often called blocks as well) - generally on modern drives they are 4KB in size but are emulated as 512-bytes. (This results in some additional reads and writes because the drive has to read and then write a full physical sector to change a single 512 byte logical sector; however due to the average size of files these days it ends up not being noticeable most of the time.)

Clusters are logical chunks determined by the OS when formatting, and are composed of one or more physical/logical sectors. Full sectors are always read because that's the smallest unit to read or write. What I think you're reading here is saying that the OS always reads a full CLUSTER, but it turns out NTFS can read and write partial clusters as well, because it knows which LBA addresses are assigned to each cluster. So if cluster B of a file is 2MB in size and is at LBA 4096 to 8191, but the OS only needs a 50KB chunk, it can instruct the drive to only pull the data from LBA 4160 to 4259. (Internally the drive has to translate that to the 4K physical sectors.) NTFS can also just write a single 50KB chunk rather than having to write the full 2MB. (Mechanical drives and SSDs handle this differently internally, but the OS doesn't see it.)

The OS keeps track of the clusters and logical sector addresses, while the drive keeps track of the physical sectors and logical sectors. The drive is responsible for translating LBA addresses to physical addresses (a modernized holdover created to allow drives to go over 8GB in size), and now also for tracking the translation from 512-byte logical sectors to 4K physical sectors.

At any rate, this doesn't change the issue you're asking the question about. Yes, Windows can just read part of a cluster, but the underlying functional part of the OS does NOT do that unless it's told to by the application. The application has to request part of a file from the filesystem driver, and the OS figures out where that chunk is stored and retrieves it. If the drive runs into an unreadable sector, the OS doesn't have a provision during normal operation for what to do about that and reports a failure to read the file. It won't even return the data from the sectors that are good unless the application specifically requested that, but the application doesn't know anything about sectors or even clusters, so it can't tell Windows to just work around the bad sector.
 
And so, the OS would recover what can be recovered from the bad sector and write elsewhere (maybe I missed that point early) so in the case, larger allocation unit size would NOT increase the size of the corruption pas the actually corrupted bytes.
The OS doesn't move any data unless you're actively running a disk check. The drive itself during normal operation will detect that a sector is approaching the threshold of unreadability set by the manufacturer and will decide to reallocate the sector. But that only happens on sectors where the drive is actively reading and writing data. If you run a full disk check, then the OS/utility will check every sector even if it's unused. The manufacturer's tool performing a full SMART scan probably does a better and faster job as it's actually just telling the controller in the drive to do the work, rather than involving the CPU.

Another reason to use the SMART utility to do a full scan is that something like chkdsk does not, as far as I can tell, reallocate bad sectors. It just marks them as bad and tries to move the data to a good sector, but in the end, you have reduced storage capacity because chkdsk isn't replacing the bad sector with one from the spare pool. The SMART data would not show an increase in the reallocated sector count. Chkdsk also does not PERMANENTLY store bad sector data. It just stores it in the $BadClus system file, so if you wipe the partition then all the sectors marked bad are lost. (This is all done at a filesystem level. Not device level.) If chkdsk is finding bad sectors that the firmware doesn't, the drive will probably fail anyway.

Chkdsk of course does still have its uses, as filesystem-level corruption can occur where there is no hardware failure.
 
But that's absolutely not the right approach to it. I'm trying to understand if I can lower the risk to get corrupted data... while hoping to keep performance optimise somehow, (because but a full integrity test (not talking about chkdsk or such but about unrar.exe t ....) of a whole 10 to 14 disk takes 24+ hours already) .

With a goal that would ideally be preservation of 100% files and I think I achieve preservation of 99.9999999% of my archives over more than 20 years.

What you are saying it when enough files are damage, one would replace the disk. In this topic, you should think about the damages to the files first, the disk itself is the element that doesn't matter, because it's just the tool and it is replaceable indeed, (not too often hopefully).



I will have nightmares now.
You are simply not listening, or not hearing, the very good advice you are getting from several directions.

One, if ANY part of a drive is bad, you SHOULD assume the entire drive is bad, at least in terms of whether that drive is any longer trustworthy, because it is not. Is it still "Usable". Well, yes, technically. Is it trustworthy. No. Absolutely not.

Two, which feeds into "One", is the fact that you've decided that the idea of a backup as the ACTUAL way to avoid data loss, is not acceptable. All this says is that you've given up on 50 years of logical data preservation techniques and decided that you know better than they do. Or we do. Or anybody does. Because nobody in their right mind would ever deny the fact that having EVERY SINGLE important thing you can't afford to lose, be in AT LEAST three places at any given time, is the ONLY way to 100% ensure you will not have a loss of data. You can do whatever you want to any drive and use any methodology you want to use, and you still cannot make any drive impervious to data loss. Even instantaneous data loss in some cases. Drive could be ten years old or ten minutes old and still fail entirely and in the same way.

The bottom line is, the minute you know something, ANYTHING, is wrong with a drive, whether it be a head problem or a surface issue, or losing sectors, or whatever, you should immediately begin moving forward with plans to relegate that drive to either warranty or replacement, and anybody who had even moderate concerns that the information on that drive would be cry worthy, should have already had that information on at least two other sources whether that was another internal drive, and external drive, an NAS or a cloud backup, so that it would not matter.

Any answer you get other than that, is simply somebody trying to pat you on the back and give you the answer you are looking for. They are not actually trying to help you avoid a catastrophe.
 
You are simply not listening, or not hearing, the very good advice you are getting from several directions.
I don't think that's the case. Everyone is simply trying to give advice that OP doesn't desire. (I've tried to keep most of my responses focused on the cluster reading issue.) He's said he already has a backup strategy. This seems to have come performance optimization, a way to more quickly recover the data than pulling it from backups, by knowing whether the cluster size can help or hinder using the RAR recovery record if this particular form of corruption occurs. I believe OP realizes that if bad sectors appear or at least if they are increasing over time, moving the data off of that drive should become a priority, but if it takes 15 times as long to pull it from a backup as it does to transfer it from drive to drive, being able to repair and verify the data on the drive becomes a reasonable desire.

I assume the only reason for keeping all this on an "archive drive" is quick access versus getting the files from your other backup methods, and maybe the cost of those other methods.

Honestly, if you have full copies of the files, and could recover one or two that got corrupted, just generating MD5 hashes of each of them and comparing them every few months seems like it would be way easier and reliable, if they're not so huge as to make pulling them from the other backup location take forever. But if your RAR test identifies only one or two bad files it seems like it would be just as simple to pull them from the other backup than to try to repair anyway.
 
Last edited:
  • Like
Reactions: MaxT2
You are simply not listening (...)

Most of your post is out of topic and assuming that I don't already know everything you say and that I don't already do most of it, (and also not even taking my replies into account).
I just cut on having only 2 copies of certain data instead of 3 because they are less important and the whole thing became expensive enough already and I'm aware that this increases risk to some extent. I agree that having 3 copies is ideal.


This seems to have come performance optimization,

Correct, it's a question of optimisation for a choice that I have to make now, a choice that probably does not have huge consequence but that I need to at this point... Which is to choose Allocation Unit Size at the time I'm moving two 10 TB archive volumes to one 26 TB archive volume (two copies in this case).

a way to more quickly recover the data than pulling it from backups,
Not exactly. What I'm hoping for is to select Allocation Unit Size that can make the (bi-annually/yearly) RAR testing of the entire drive slightly faster. Because previously it could already take more than 24 hours for a drive to get tested, and getting a bigger drive will take even long. (On the other hand, I upgraded one of my computers recently, but I rather run these long test sessions one of my secondary computers.)

by knowing whether the cluster size can help or hinder using the RAR recovery record if this particular form of corruption occurs.

Yes.

I believe OP realizes that if bad sectors appear or at least if they are increasing over time, moving the data off of that drive should become a priority,

Yes, and that's the reason why I'm merging these archive volumes, because one of the drives started showing corruption.
(And I'll resell the remaining OK-drives or use them for something else. I run some secure erase process if I re-sell them.)

On the other hand, corruption at file/RAR level, without corruption in the underlying layers, happens much more often and I simply live with those otherwise I would have to replace some drive much too often.
(I think that they can simply be cause by things like the system crashing.)


but if it takes 15 times as long to pull it from a backup as it does to transfer it from drive to drive, being able to repair and verify the data on the drive becomes a reasonable desire.

Usually the time of recovering a file is long only because one of the copies it stored in another place most of the time.
But also, most times, RAR is simply able to repair the archive so I don't even need on of the extra copies right away.

I assume the only reason for keeping all this on an "archive drive" is quick access versus getting the files from your other backup methods, and maybe the cost of those other methods.

Mostly to keep control over my stuff. Then "quick access", yes I always have one of the copies at home that I can just plug in one of my computers when I need to access some archive.
I don't remember if it was on this forum (maybe it was on a subreddit), but we had concluded relatively recently that the cost of paying for good hard drives and the cost of paying for good cloud were often similar in money/TB, with some variations from time to time. Except the cloud comes with more constraints... upload time, access time... (I'm not even sure if my ISP would let me upload 50+ TB)

I don't want to rely on "backup solutions" for various reason: archive, backups, and also system resilience are different things, I don't any black-boxy proprietary format or a software deciding what to do my files (note that RAR is proprietary and not open source but UNRAR is open source... and I think it's got a long enough history and compatibility and if I should ever change format, automating mass-conversions would be rather easy).


Honestly, if you have full copies of the files, and could recover one or two that got corrupted, just generating MD5 hashes of each of them and comparing them every few months seems like it would be way easier and reliable,
I think that's what RAR testing does, it probably compares hashes, or something similar. I never tried to investigate if there was a faster method, that would be interesting, but so far I assumed RAR knows what it does with its own format files.

(...) But if your RAR test identifies only one or two bad files it seems like it would be just as simple to pull them from the other backup than to try to repair anyway.

It depends where the other copies are. Actually, what I often do is that I first repair the file, and at some synchronisation moment, I recover the file from the other copy so they keep the same create/mod. date.

Everyone is simply trying to give advice that OP doesn't desire.

Yeah, but that's something I often observed with precise technical topics.... I come asking a very precise question. I try to give just enough context that I think is necessary to pin point the question. But maybe I overexplain? Then some people will disagree with whatever element from the context (which is OK, the issue is when it derails conversation), or enumerate all their advice/knowledge as if they were teaching their Grandma and as if I didn't already know. Of course it's sometime an occasion to actually learn something of refine some knowledge, but the issue again is when it derails the conversation.

In example... I didn't not initially talk about the fact the I already have multiple copies because that is not relevant to the Allocation Unit Size question. But then as I didn't mention people assumed that I was only relying on RAR repair feature, which is not the case, but it's the aspect I wanted to focus on in this thread. And then they absolutely need to explain it, despite the that it has nothing to do with the question. (Or at least not directly.)

(I've tried to keep most of my responses focused on the cluster reading issue.)

Indeed, thanks.



So back to the topic...
I currently think that larger Allocation Unit Size may increase severity of potential future data corruption. I need to re-check the "study" I posted in an earlier reply to try to understand if this this increment is too significant or not. (But I won't check that before tomorrow, it's too late here right now.)
 
Last edited:
The thing about using MD5 hashes is simply that it leaves the original files alone and creates a hash file. It doesn't use any proprietary file type, and you can find hundreds of MD5 hash generator/comparators. There's probably even some sort of management tool for them so you can generate them in batches and have something like a database. But of course, MD5 can't repair the files, only detect. Repair obviously requires a lot more data to be generated and I don't know enough about RAR to know how it works on that. (I haven't understood why people still use RAR for a long time, since it was never free. I didn't know it had this recovery capability though it still seems niche.)

How big are the RAR files you've got? It seems like with sequential corruption and 1% recovery data, up to just under 2MB of loss could be recovered in the liamfoot tests, but the sparse corruption was almost entirely unrepairable with only a small amount of damage. At 5% recovery data, it got a lot better, but there's still a point where there's just too much damage. With 5%, I think even your 2MB cluster size would usually be recoverable even if the damage crossed a couple of clusters and all of the clusters' data was lost, because they were able to recover at up to almost 10MB of loss. (3% might be just as good, probably reaching over 4MB recoverability.)

In truth, if a drive suffers bit rot it's probably going to happen in one small section, maybe a few adjacent sectors, in the time between your tests, so two clusters (1024 physical 4K sectors) might get damaged but probably not more than that. In order to damage multiple other clusters, it would have to be some serious physical damage happening (environmental, or a severe drop while it was running so the heads crash during operation, etc.) and certainly for sparse damage due to hardware to occur there would have to be major mechanical failures with the head bouncing up and down. You probably aren't even going to be able to read the drive at all in those cases. And if you can, depending on the file size, those multiple damaged clusters might not even be within one file so you could potentially still recover some data.

They also noted that the sparse corruption test seemed to actually be damaging the recovery record as well as the original file in some cases, so you'd just be screwed there. And damaged headers just makes the file unrecoverable entirely.

I don't think I'd even be bothering with this method of recovery though, if I have two copies already, unless that second copy is REALLY hard to get to. Any "active" files should be on a system with a regular backup schedule, and only removed when both archive drives have received the new data. If the on-site drive does have a problem and you lose a file, do you need it so soon that the time to get the other drive is unbearable? (Don't forget to rotate which one is on-site and which one is off, so one isn't getting more read activity than the other.)

That said, I think 2MB cluster sizes are likely acceptable in terms of recoverability if you go with 5% recovery data, even 3%. It still leaves some chance of unrecoverable files, but only if there is a lot of corruption. If there is a physical defect causing that much corruption, I might not even want to use any of the data on that drive and just copy the good archive drive to a fresh new one.

Incidentally, if you got something other than a convention magnetic recording drive (CMR) such as shingled (SMR) the potential for more widespread damage probably increases massively, because the physical bits overlap each other. Damage to one could affect bits on two other tracks.
 
  • Like
Reactions: MaxT2
In truth, if a drive suffers bit rot it's probably going to happen in one small section, maybe a few adjacent sectors, in the time between your tests, so two clusters (1024 physical 4K sectors) might get damaged but probably not more than that. In order to damage multiple other clusters, it would have to be some serious physical damage happening (environmental, or a severe drop while it was running so the heads crash during operation, etc.) and certainly for sparse damage due to hardware to occur there would have to be major mechanical failures with the head bouncing up and down. You probably aren't even going to be able to read the drive at all in those cases. And if you can, depending on the file size, those multiple damaged clusters might not even be within one file so you could potentially still recover some data.
See my Toshiba drive results above.
Went up to 14k+ bad sectors quite quickly.
Just sitting in the NAS enclosure.
 
See my Toshiba drive results above.
Went up to 14k+ bad sectors quite quickly.
Just sitting in the NAS enclosure.
Yeah but did you find out WHY? That doesn't really sound like normal bit rot, though it could have been a major manufacturing defect that was brought to light by the warmth of running in the enclosure and eventually caused failures. Like paint bubbling off of a surface. Of course there is always a chance of catastrophic failure like that, but it's a much lower chance than your average few bad sectors appearing.
 
I don't think I'd even be bothering with this method of recovery though, if I have two copies already,
Exactly the point. The rest of this is just trying to find solutions for a problem that was solved long ago. If you trust a drive that has already exhibited sector failures or bit rot, that's your call, but there is no magic bullet that is going to give you the result you "desire". It's just a waste of time and energy when better solutions already exist.

like it would be just as simple to pull them from the other backup than to try to repair anyway.

And that's one of them.
 
The thing about using MD5 hashes is simply that it leaves the original files alone and creates a hash file...
So as I understand this, this is already included in RAR.
I don't know enough about RAR to know how it works on that. (I haven't understood why people still use RAR for a long time, since it was never free. I didn't know it had this recovery capability though it still seems niche.)

I've used it for a very long time. It's not so niche, a lot of people use it when sharing files.
The debate is often between 7-Zip and RAR. The reason why I sticked with RAR over time is that some see the Recovery Record as an easy-to-use extra safety that 7-Zip does not offer, it needs extra parity files to achieve the same thing. Overall, RAR is more stable than 7-Zip... and recently I have seen some coding YouTube review how poorly 7-Zip's code (or parts of it) is written... And those who defend 7-Zip often confuse the Recovery Record for a lesser performing compression (last time I checked, which was years ago, they were roughly equal in terms of compression ratio) ... Though, that's about 7-Zip, which I also often install anyway on my system, and I also use 7-Zip at work because companies don't want to pay licenses.
But I guess the main reason why I still use RAR is that I was already using it before, it has enough popularity/is standard enough and comes with easy features I can use.
How big are the RAR files you've got?

Most of my Recovery Records are 3% (it's the default value), sometimes I set 5%

(Don't forget to rotate which one is on-site and which one is off, so one isn't getting more read activity than the other.)

I sometimes do that, sometimes not. But the drive that is on site is mostly stored outside of any system any way, so the difference is not big. Whenever I re-sell some, they usually feel like a car with a very low mileage.

I don't think I'd even be bothering with this method of recovery though, if I have two copies already...

The thing is that we are focusing on the Recovery Record in this thread because it is relevant to the Allocation Unit Size question.
It is also not my "most required" feature... Though it proved to be rather efficient some times.
What I think it the most important feature in terms of formats is to have ALL content files inside TESTABLE containers. The fact the these files are testable is very important to spot and address issues.
An exception I know of are FLAC files as they can be tested using flac.exe . But if you leave an image or a text file alone it may get corruption that alters its content and how can you realise? Or if a random file from, let's say 12 years old, that you didn't to access to recently can't be open... You can't go through each individual files and test them manually and check their contents.


That said, I think 2MB cluster sizes are likely acceptable in terms of recoverability if you go with 5% recovery data, even 3%.

I think that choosing anything between 4 kB and 2 MB won't have that huge of an impact. But I made this thread because wanted to understand if I could optimise that choice.
Maybe I simply never "overthought" it in the past.

Incidentally, if you got something other than a convention magnetic recording drive (CMR) such as shingled (SMR) the potential for more widespread damage probably increases massively, because the physical bits overlap each other. Damage to one could affect bits on two other tracks.
I checked, I think all model I have are CMR.


Exactly the point. The rest of this is just trying to find solutions for a problem that was solved long ago. If you trust a drive that has already exhibited sector failures or bit rot, that's your call, but there is no magic bullet that is going to give you the result you "desire". It's just a waste of time and energy when better solutions already exist.

Why do you keep derailing the thread implying stuff that was not actually said?
 
Last edited:
  • Like
Reactions: evermorex76
That said, I think 2MB cluster sizes are likely acceptable in terms of ...
Oh and what I'm thinking right now is, maybe, to have one of the copie's Allocation Size Unit set 64 kB (or such) and the other to 1 or 2 MB (one would fill slightly faster that the other). I'm not in a rush to decide right now.
 
I’m at a loss to understand why you are wanting to mess with the drive formatting, allocation unit, questions comes to mind.

You said you have many (many) TB of data to back up, 50 to 75TB. How are you dividing your data as you must be using multiple drives? Are you wiping the drives between backing up again?

Assuming you are using a clean drive the device will write the backup in the most efficient manner, data will be written serially in contiguous blocks. The allocation unit size will be of little consequence.
The allocation size was a fudge, it mattered when drives were FAT 16 and FAT 32. Increasing the size of the allocation size allowed for larger volume sizes but the number of entries was limited. NTFS blew the limit out of the water. It’s not a concern today. Maybe in the future but not today.
Larger allocation units are a bad thing for general use. As earlier a 2MB allocation unit with a 2 byte file doesn’t use 1,999,998 bytes of the allocation unit.
You say that RAR uses 64k chunks… 4k to 64k allocation units might make some sense but any gains would be marginal at best and nonexistent on a drive that was empty immediately prior to the backup being written.