how much corruption to expect from write-back caching + power loss

Jormungandr

Distinguished
Jun 5, 2011
9
0
18,510
This is two related questions that's been on my mind few a few years now and I've never really found a satisfying answer.


    ■ How bad is data loss caused by power outages when using write-back caching, generally speaking?
    ■ How bad is it in the worst case scenario?


If it helps to have a specific use case to work with, I currently have a 6drive raid 10 array on an Adaptec RAID card. The Raid Card does not have a battery backup for its cache. The resulting logical drive is used for storage of large quantities of valuable data, including a great deal of lossless video waiting to be processed into finished work (I do freelance multi-media work on the side). I often need to write 100-150 megabytes of data per second to the drive, hence the choice to switch to write-back caching. But I'd be better off looking for faster compression or even lowering the quality of my working files than putting all that data at risk every time I write to the drive.


-- More Detail --

I'm a computer tech (~15 years) so I have a middling idea of the issue created by write-back caching. But, as a for-hire tech, my job is usually more about solving the problem than understanding every detail of the causes. I'm like a general practitioner vs a specialist. You want your computer up and running again? Call me. You want to know the nitty gritty of why it stopped? Call some one who builds those components.

I know that write-back caching reports data written as soon as the data reaches the cache... which means that the data is still reliant on a constant power source to remain viable till it is actually written to the storage drive. If power to the cache media is lost before the data is fully written to the storage media data loss can occur.

And that's as good an answer on the subject as I've found. But it doesn't really explain much.

"Data loss" for example, is a very generic term here. Are we talking about a couple bytes of data or the entire filesystem? Both are "data loss" but one of those only worries me a little while the other has the potential to fill my underwear with fecal matter.

On the face of it I would assume the only data at risk is the data in the cache but not yet written to the disk. But that doesn't quite make sense. The same risk exists with data in a write-through cache. Or even write-around. Or even no cache at all. If the only copy of the data is in memory when power loss occurs it will be lost, no matter what kind of caching is (or isn't) used or whether the system has been notified yet that the data is finished writing or not. The only case where I can think of this not being a risk for non-write-back caching methods is when moving data from one storage media to another. If a power loss happens while trying to save that 500megabyte poster you've been working on for 6 hours you're going to lose it whether the OS and photo-manipulation software has been erroneously informed that the data is safe on the storage media or not.

Which makes me pause and question why such emphasis would be put on the dangers specific to write-back caching when that risk would only apply to a small percentage of write operations. Which brings me back to wondering if perhaps it doesn't effect more than just the cached-but-unwritten data

And it could... I suppose. What if windows was modifying the MFT for the filesystem and, because it had been informed that it's changes are fully written, goes ahead and starts modifying the backup MFT too... only a power loss happens and now both the main and backup MFTs are inconsistent. I can see that resulting in a terrifying level of data loss under just the right circumstances. Only I'm not really sure that's how NTFS handles updates to MFTs. It's one of the many tiny details that a computer tech doesn't really need to know. And that's before we throw in things like Parity RAID and other already delicate storage paradigms.

So can anyone help me untangle this mess of "but what if"s? Any info would be appreciated.
 

molletts

Distinguished
Jun 16, 2009
475
4
19,165
The main problem with write-back caching on NTFS volumes is that it can affect the order in which data sent by the system to the RAID controller actually reaches the physical disks.

NTFS, like most modern file systems (i.e. not FAT32!) is what is known as a journalling file system. When a change is made to the file system, that change is first recorded in the journal (on NTFS, this is a hidden system file called $LOGFILE). The change is then made to the actual file system and, once it has been completed successfully, the entry in the journal is marked as complete. If power fails before the changes are completed, NTFS can see that there are not-yet-completed changes in the journal and decide to either complete them or roll them back (this is called "replaying the journal").

Some changes may depend on others - a file size increase in the MFTs, indexes, free space bitmap, etc. may depend on the additional file data also being written to the disk. This might be represented by two journal entries - a "metadata" transaction to update the file system structures and a "data" transaction to track whether the data has been written. The metadata transaction would depend on the data one so if, when replaying the journal after a failure, the data one is still incomplete, the metadata one would be rolled back (or at least not replayed) but if the data transaction had been marked complete, the metadata would be updated to reflect the new file size.

(Some file-systems even offer data journalling, either by default or as an option. This, of course, usually has a performance penalty because the data needs to be written twice - first to the journal and then to the actual file. More advanced file systems work around this by effectively having a moving journal, whereby the data in the journal becomes the new file contents when the metadata is updated and a new area of disk is assigned to the journal in its place.)

If there is a write-back cache between the OS and the disks, it is possible that entries may not go into the journal before the changes to the main file-system, or they may be marked as complete before the changes are actually made - the RAID controller may well see the sequence of "pop some data in place A" (write the journal entry) - "do something in place B" (make the changes) - "update the data in place A" (mark the journal entry complete) as being ripe for optimisation: why go to A, then to B, then back to A when you could do it more efficiently by simply writing both changes to A at once then going on to B? With a battery-backed cache (or a flash-memory cache), the data for B will be retained and will get written back to B when power is restored but without some kind of non-volatile cache it will be lost and the file system will end up being inconsistent.

Best case scenario - you won't notice any difference but you might get inconsistencies reported next time you run CHKDSK. Worst case scenario - unmountable volume.

There's always a small risk, even without a big RAID controller cache, unless you also disable the write-back caches on the hard drives themselves, but on-drive caches are generally so small that the risk is acceptable for normal day-to-day usage.

There are mechanisms for mitigating or (virtually) eliminating the caching risk while retaining most of the benefits of caching - many hard drives support a feature called Force Unit Access (FUA) which allows the OS to request that the drive bypasses its write-back cache for a specific write operation. I don't know whether your RAID controller offers this on its logical disks. On drives that don't support it, a less-refined technique can be used whereby the OS tells the drive to flush its cache at strategic moments - perhaps after writing a bunch of transactions to the journal, then after writing the related data, prior to updating the file-system structures, followed by a final flush to commit the changes. These commands may well be ignored by RAID controllers with big caches because they would cause a significant wait while the cache was being flushed.

I hope all this has helped!

Stephen