Rusting In Peace

Distinguished
Jul 2, 2009
1,048
0
19,460
I've spotted something curious on the ZFS wiki page regarding using this file system in conjunction with hardware raid:

"ZFS can not fully protect the user's data when using a hardware RAID controller, as it is not able to perform the automatic self-healing unless it controls the redundancy of the disks and data. ZFS prefers direct, exclusive access to the disks, with nothing in between that interferes. If the user insists on using hardware-level RAID, the controller should be configured as JBOD mode (i.e. turn off RAID-functionality) for ZFS to be able to guarantee data integrity"

There are no links to any sources that confirm this on the wiki so I'm wondering if it's actually true?
 
Solution
That's right. ZFS is a file system which incorporates raidlike features. It will not declare a write written until it confirms the written data equals the data in RAM. IMHO its like file system/disk level ecc.

Referring to the other thread, that's why I went with ECC ram in my NAS build. The data I send to the NAS is the data that WILL be written and, conversely read back one day. My data is not worth millions of dollars but losing a file that you thought you had sucks. RAM's cheap and ecc is not much more. Ultimately one should have ecc on the pushing computer as well but... everyone has a limit and mainstream intel boards don't support ecc ram.

Your quoted paragraph is typical of Wikipedia. The information is basically...
The problem is that if the underlying RAID controller detects an error it will try to do its own recovery of the data. If it's unable to do so it will declare the volume dead and it won't permit the ZFS file system to access the data in order to recover at that level.
 

adampower

Distinguished
Apr 20, 2010
452
0
18,860
That's right. ZFS is a file system which incorporates raidlike features. It will not declare a write written until it confirms the written data equals the data in RAM. IMHO its like file system/disk level ecc.

Referring to the other thread, that's why I went with ECC ram in my NAS build. The data I send to the NAS is the data that WILL be written and, conversely read back one day. My data is not worth millions of dollars but losing a file that you thought you had sucks. RAM's cheap and ecc is not much more. Ultimately one should have ecc on the pushing computer as well but... everyone has a limit and mainstream intel boards don't support ecc ram.

Your quoted paragraph is typical of Wikipedia. The information is basically correct but poorly worded and not sourced. Not that I could do better.

Check out ZFSGURU.com if you're interested in submesa's little project. I'm sure he would recommend using PCIe based raid cards solely for gaining sata ports. And btw NEVER PCI based controllers. Don't know why.
 
Solution
You should have ECC in your desktop system as well, since that's the source of the data that's sent to the NAS and then written to the disk.

ECC memory should be standard in all systems, IMHO. Memory is very reliable but problems do happen, and it's the ONLY component in a modern computer system that doesn't have any provision at all for detecting and reporting errors.
 

Rusting In Peace

Distinguished
Jul 2, 2009
1,048
0
19,460


A RAID controller will do error recovery? I thought it would just degrade the array? What types of errors are we talking about here?



Yeah I didn't want to take over that other thread so thanks for posting here about the RAM and ZFS. I can see why it's important for a system using ZFS. I was just concerned that I had got something horrifically wrong not using ECC in my NAS. I can understand it's benefit but I guess it's not essential for success.

 

adampower

Distinguished
Apr 20, 2010
452
0
18,860


As sminlal said every computer should use ecc ram. Why the technology is reserved for enterprise servers is beyond me. If you've been around computers for long you have seen corrupted files. How did they get that way? We don't know, but we have ways to limit the chances at some steps. ECC ram eliminates the chance that something going into ram can come out different. Now we ensure that writes to a disk without corruption. Coming back we read the data and bring it back to the RAM buffer. So far there were three chances to corrupt the data and two of them were in ram.

The infinitesimal error rate at the bit level becomes much greater as we build GB and TB worth of files. Some paranoid computer users feel manufacturers ability to eliminate errors has not kept up with software bloat. Does that mean there are more files corrupting now than ever? Maybe. Regardless of the numbers it is prudent to be careful with data. Especially when its easy to do so. ECC ram is the same speed as regular ddr3 1333. It just has an extra error checking bit for every byte. In this case technology to avoid errors has evolved to facilitate HUGE file creation. Why not attempt to take advantage?

Most ASUS AMD boards support ecc ram. Some others do to I'm sure. Unfortunately, intel desktop chips are better in every way. My desktop runs on intel. But I get a twitch sometimes and think I should jump ship for the simple reason that amd supports ecc.
 
Of course it will - that's the whole point of a redundant RAID configuration. If a drive dies (the error), you replace it and the RAID controller rebuilds the set (the recovery). But if, during the course of the rebuild, it can't read required data from one of the other surviving disks then as far as the controller is concerned the whole volume is toast.

All of this takes place below the ZFS level - the ZFS file system has no ability to control or recover from this process.
 
I built my current system around a Xeon W3520 CPU because it supports ECC. This CPU is essentially identical to the Core i7 920, but it has a smarter memory controller to handle the ECC. The CPU was harder to find - I had to buy it from a distributor that handles enterprise equipment - but it was literally only $1.00 more than the equivalent i7 920 chip.

The ECC memory was also a very small cost increment - I only paid about $20 extra for 12GB of it. The biggest cost factor was having to buy a W3520-compatible motherboard (it uses the same socket but the BIOS has to be smart enough to configure the ECC registers in the CPU). When you add in the cost of everything else (case, power supply, video, disks, etc.) the overall cost premium to have ECC memory was only about 5%.