This is two related questions that's been on my mind few a few years now and I've never really found a satisfying answer.
If it helps to have a specific use case to work with, I currently have a 6drive raid 10 array on an Adaptec RAID card. The Raid Card does not have a battery backup for its cache. The resulting logical drive is used for storage of large quantities of valuable data, including a great deal of lossless video waiting to be processed into finished work (I do freelance multi-media work on the side). I often need to write 100-150 megabytes of data per second to the drive, hence the choice to switch to write-back caching. But I'd be better off looking for faster compression or even lowering the quality of my working files than putting all that data at risk every time I write to the drive.
-- More Detail --
I'm a computer tech (~15 years) so I have a middling idea of the issue created by write-back caching. But, as a for-hire tech, my job is usually more about solving the problem than understanding every detail of the causes. I'm like a general practitioner vs a specialist. You want your computer up and running again? Call me. You want to know the nitty gritty of why it stopped? Call some one who builds those components.
I know that write-back caching reports data written as soon as the data reaches the cache... which means that the data is still reliant on a constant power source to remain viable till it is actually written to the storage drive. If power to the cache media is lost before the data is fully written to the storage media data loss can occur.
And that's as good an answer on the subject as I've found. But it doesn't really explain much.
"Data loss" for example, is a very generic term here. Are we talking about a couple bytes of data or the entire filesystem? Both are "data loss" but one of those only worries me a little while the other has the potential to fill my underwear with fecal matter.
On the face of it I would assume the only data at risk is the data in the cache but not yet written to the disk. But that doesn't quite make sense. The same risk exists with data in a write-through cache. Or even write-around. Or even no cache at all. If the only copy of the data is in memory when power loss occurs it will be lost, no matter what kind of caching is (or isn't) used or whether the system has been notified yet that the data is finished writing or not. The only case where I can think of this not being a risk for non-write-back caching methods is when moving data from one storage media to another. If a power loss happens while trying to save that 500megabyte poster you've been working on for 6 hours you're going to lose it whether the OS and photo-manipulation software has been erroneously informed that the data is safe on the storage media or not.
Which makes me pause and question why such emphasis would be put on the dangers specific to write-back caching when that risk would only apply to a small percentage of write operations. Which brings me back to wondering if perhaps it doesn't effect more than just the cached-but-unwritten data
And it could... I suppose. What if windows was modifying the MFT for the filesystem and, because it had been informed that it's changes are fully written, goes ahead and starts modifying the backup MFT too... only a power loss happens and now both the main and backup MFTs are inconsistent. I can see that resulting in a terrifying level of data loss under just the right circumstances. Only I'm not really sure that's how NTFS handles updates to MFTs. It's one of the many tiny details that a computer tech doesn't really need to know. And that's before we throw in things like Parity RAID and other already delicate storage paradigms.
So can anyone help me untangle this mess of "but what if"s? Any info would be appreciated.
■ How bad is data loss caused by power outages when using write-back caching, generally speaking?
■ How bad is it in the worst case scenario?
If it helps to have a specific use case to work with, I currently have a 6drive raid 10 array on an Adaptec RAID card. The Raid Card does not have a battery backup for its cache. The resulting logical drive is used for storage of large quantities of valuable data, including a great deal of lossless video waiting to be processed into finished work (I do freelance multi-media work on the side). I often need to write 100-150 megabytes of data per second to the drive, hence the choice to switch to write-back caching. But I'd be better off looking for faster compression or even lowering the quality of my working files than putting all that data at risk every time I write to the drive.
-- More Detail --
I'm a computer tech (~15 years) so I have a middling idea of the issue created by write-back caching. But, as a for-hire tech, my job is usually more about solving the problem than understanding every detail of the causes. I'm like a general practitioner vs a specialist. You want your computer up and running again? Call me. You want to know the nitty gritty of why it stopped? Call some one who builds those components.
I know that write-back caching reports data written as soon as the data reaches the cache... which means that the data is still reliant on a constant power source to remain viable till it is actually written to the storage drive. If power to the cache media is lost before the data is fully written to the storage media data loss can occur.
And that's as good an answer on the subject as I've found. But it doesn't really explain much.
"Data loss" for example, is a very generic term here. Are we talking about a couple bytes of data or the entire filesystem? Both are "data loss" but one of those only worries me a little while the other has the potential to fill my underwear with fecal matter.
On the face of it I would assume the only data at risk is the data in the cache but not yet written to the disk. But that doesn't quite make sense. The same risk exists with data in a write-through cache. Or even write-around. Or even no cache at all. If the only copy of the data is in memory when power loss occurs it will be lost, no matter what kind of caching is (or isn't) used or whether the system has been notified yet that the data is finished writing or not. The only case where I can think of this not being a risk for non-write-back caching methods is when moving data from one storage media to another. If a power loss happens while trying to save that 500megabyte poster you've been working on for 6 hours you're going to lose it whether the OS and photo-manipulation software has been erroneously informed that the data is safe on the storage media or not.
Which makes me pause and question why such emphasis would be put on the dangers specific to write-back caching when that risk would only apply to a small percentage of write operations. Which brings me back to wondering if perhaps it doesn't effect more than just the cached-but-unwritten data
And it could... I suppose. What if windows was modifying the MFT for the filesystem and, because it had been informed that it's changes are fully written, goes ahead and starts modifying the backup MFT too... only a power loss happens and now both the main and backup MFTs are inconsistent. I can see that resulting in a terrifying level of data loss under just the right circumstances. Only I'm not really sure that's how NTFS handles updates to MFTs. It's one of the many tiny details that a computer tech doesn't really need to know. And that's before we throw in things like Parity RAID and other already delicate storage paradigms.
So can anyone help me untangle this mess of "but what if"s? Any info would be appreciated.