[SOLVED] best archival strategy

Nov 12, 2012
25
0
10,540
I'm trying to resolve the best archival strategy for data. I'd like a strategy that will assure it's viability decades from now. Hardware obsolescence is NOT an issue. There will always be someone around who can read a floppy disk or digitize 8mm movies (just did that for 70 year old movies).

Right now, I'm using M-disks, which look like a pretty good solution. But with a capacity of only a few gigs, it takes many of them.

Hard drives are OK, but I'm uncomfortable with mechanical devices. If a bearing goes out, you've lost your data, even if the magnetic domains are solid.

Cloud storage is fine, but you're just depending on someone else to do it responsibly.

USB drives sound good, but everyone seems to say that they won't hold data for more than a decade. No better than plain old DVDs. Seems to be an issue of cell leakage and cell degradation from lots of writes.

I'm wondering if properly maintained USB drives are the way to go. Very compact. As in, rewrite the whole thing every decade. Refresh the bits, and because you're doing it very infrequently, you aren't degrading the cells.

It would be nice to see a contemporary essay about digital archiving strategies. Can anyone point me to one?
 
Solution
In terms of simplicity, cloud storage is your best bet. Yes it just means you're passing off the task of maintaining the data to someone else. But if you choose a decent provider, they're assiduous with backups. They'll also upgrade their storage over time, basically doing the same thing as buying a new (larger) HDD and copying all your data over to it. Don't think of it as offloading these tasks onto someone else. Think of it as pooling your resources with other people who want to archive data, and hiring someone whose sole job is to do it for everyone.

For larger files (like movies) stored on HDDs, you may want to store them as Parchives (parity archives). That breaks the file up into lots of smaller files, then creates...

USAFRet

Titan
Moderator
Decades?
No single platform or media is 100% guaranteed to last that long.
You can't just copy it on to something, never touch it, and then hope it will read.

You'd never know, until 30 yrs from now, you crank up that drive and it has already failed.

We've not been in this game long enough to really tell.
Predictions from artificial wearing is one thing. But 30 years ago was 1988. Terabyte hard drives, DVD's, and USB sticks were a pipe dream.

The best way is a periodic refresh, onto some new media. And more than one copy.
Not only bitrot, but external fail. Fire/flood/theft/etc.

Of those 70 year old movies...if you had 100 of them...would you absolutely sure that 100% of them were 100% playable? How many miles of old film have been lost forever?
Analog is a whole lot more forgiving than digital. A single corrupt frame in an old movie would be unfortunate, but not critical.
A single corrupt block, sector, or bit in a digital format may render the whole thing unreadable.
 
Nov 12, 2012
25
0
10,540
Sorry, but M-disks are 100% guaranteed to last that long without being touched. But they are only 4.7GB each.

HVDs? No, those are just optical disks, which have the same degradation problems as DVDs. Their only advantage is data density, not longetivity.

Fire/flood/theft/obsolescence aren't my concern here. Mitigation of those concerns are completely separate issues than digital data preservation.

My 8mm movies were 100% playable according to the service who did the port. Some splices broke, but I am told those splices preserve less well than the film.

Yes on periodic refresh. I have no problem "touching" them occasionally. But always onto new media? As in, chuck my 128GB USB drive and get a new one every decade?? That seems kinda silly.
 

USAFRet

Titan
Moderator


Yes, I know the claim about M-disks being 100% guaranteed.
But until we actually reach that time span...it is simply a prediction.

No, not necessarily toss out the old drive. But new data sizes generally make old drives obsolete.
1988, a 20 megabyte hard drive was state of the art. Today, that size is not even a rounding error on a $40 1TB drive.

For your 128GB USB? I'd do it much more often than once a decade. Even if just copy it elsewhere, reformat, and copy it back.


What is the size of data you're looking at, and what type of data? Corporate, personal, other?
 
Nov 12, 2012
25
0
10,540
Correct about the prediction for M-disks -- it's just a prediction, but the resiliency tests that have been done on them are extremely impressive.

Let's say I'm looking at 100 MB for one archive. Happens to be personal, but data is data. Wouldn't matter if it was corporate, royalty, or military.

Not sure what you're saying about my 128 GB USB. You're saying that if I rewrite it every few years I'd be safe? That's a do-able archive strategy. That would hardly be exercising the bits, which degrades them, and leakage failure time scale ought to be a lot longer.
 

USAFRet

Titan
Moderator


Yes, rewrite every once in awhile
But double or triple of them, of different brands or media types. Don't entrust this to a single item. Unlikely that 3 of them would all go bad in that time span.

I have data living on my systems that I created in the early 90's-late 80's.
Of course, it is not on the original drives. Some data that I created in 1978 was on a 10 megabyte HDD. In 1993, on a 125 megabyte drive. Those drives are long gone. The data survives, subsumed into drives and platforms that are much, much larger.

Stuff that was created on a 10 megabyte drive now lives on a NAS volume of 12 TB.
 
Nov 12, 2012
25
0
10,540
You understand that Laboratoire National de Métrologie et d’Essais study evaluated disk lifetimes at 90C. As in, 195F. Yes, when my house burns down for 250 hours, they might not be safe. And those Syylex disks are much more pricey. A few hundred dollars each a year or two ago.

And yes, you can get Blu-Ray M-disks. The $/GB is about the same as for the 4.7GB disks.
 

USAFRet

Titan
Moderator
Reading material, from the Library of Congress:
http://digitalpreservation.gov/personalarchiving/

--------------
Check your photos at least once a year to make sure you can read them.
Create new media copies every five years or when necessary to avoid data loss.
--------------
 

USAFRet

Titan
Moderator


Yes, but that speaks to artificial aging.
90C is not natural. But 30 years at room temp is also unnatural, because those media has not existed for that long.

There's only so much you can simulate.
 
In terms of simplicity, cloud storage is your best bet. Yes it just means you're passing off the task of maintaining the data to someone else. But if you choose a decent provider, they're assiduous with backups. They'll also upgrade their storage over time, basically doing the same thing as buying a new (larger) HDD and copying all your data over to it. Don't think of it as offloading these tasks onto someone else. Think of it as pooling your resources with other people who want to archive data, and hiring someone whose sole job is to do it for everyone.

For larger files (like movies) stored on HDDs, you may want to store them as Parchives (parity archives). That breaks the file up into lots of smaller files, then creates additional parity files. If you break it into 100 files, and add 20 parity files, then up to 20 individual files can become corrupt before the original file becomes unrecoverable. If you keep it as a single large file, then a single bit error will corrupt the entire file.

https://en.wikipedia.org/wiki/Parchive

It won't help if the entire drive dies. But it will help protect against bit rot (bit flips due to cosmic rays, or magnetic strength deteriorating) corrupting an entire large file. (CDs and DVDs have a ton of this type of redundancy built into their encoding layer, which is why you can scratch a disc and it'll still work file. The extra parity info that's written is sufficient to recover the data destroyed by the scratch. HDDs also have this type of redundancy at the magnetic level. But Parchives add it at the filesystem level.)

Personally I'd go with copying files every 7-10 years to a new HDD, plus cloud storage of the most important files. That's pretty much what I do. I've got some 20,000 photos stored on my NAS. I back that up to an external HDD array once a month. Every 7-10 years I replace all the NAS drives with new ones. The NAS runs FreeNAS, which uses ZFS. ZFS has its own file protections. In addition to RAID-5-like redundancy, it also scans all files once a month to check for bit rot. If it detects a file has changed, it "heals" the damage by recovering the original file from the parity data.

The most important files are also backed up to cloud storage (I got 50 GB free for signing up with Box when they first started out). This is to protect me in case my house burns down. I want to set up a similar NAS at my sister's house, and have both of them back each other up. But thus far I've been unable to convince her she needs a NAS. :lol:


  • ■Google gives you 15 GB of cloud storage for free. But you get unlimited storage of photos up to 2048x2048 in resolution. You also get unlimited storage of videos, but I don't know their current restrictions. It used to be videos up to 1080p and 15 minutes length. But I can't find a statement from Google saying what their current limits are.
    ■If you subscribe to Amazon Prime, it includes unlimited cloud storage of photos of any size via Prime Photos.
    ■If you subscribe to Office 365, it includes 1 TB of cloud storage on OneDrive.

Also make sure you verify files after you copy them to your archival media. Easy to do with CDs/DVDs (most burning software has an option to verify after the burn). But with HDDs, the default Windows copier doesn't support verify anymore. You need to use a different file copier. I use an older version of Teracopy (v 2.27; version 3.x had problems handling copies of tens of thousands of files). But there are lots of alternate file copiers which support verify after write. In Unix you can use rsync to verify that the copy matches the original.


128 GB sounds like a flash drive, not a HDD. Flash memory stores info by trapping a charge inside a cell. The voltage of that charge tells it whether you've stored a 0 or a 1 in the cell.

That charge slowly leaks out over time. It'll probably last a few years, but I doubt it'll last a decade. If you're using flash media for backup, I'd recommend refreshing it every year just to be safe. Completely copy everything off the flash drive, then write it all back.


The better optical media (CDs and DVDs) are known to survive 10-15 years. M-DIscs perform substantially better than those in accelerated aging tests. They basically took the two primary failure modes of regular CD/DVDs, and eliminated them. (The write layer the laser burns holes into normally degrades over time due to oxidation. M-Disc uses a material which doesn't oxidize. The reflective layer can also lose reflectivity or separate/flake off with time. M-Disc doesn't have a reflective layer.)
 
Solution
Nov 12, 2012
25
0
10,540
That's sensible, about using multiple USB media from different manufacturers, and rewriting every few years. Not as easy as M-disks, but takes up a lot less space.

There are abundant cautions about NOT using USB memory for permanent storage because they degrade as you write on them. But in this case, we're not doing that much writing!

But that being said, flash memory is understood to have retention ages of less than common DVDs.
 

kanewolf

Titan
Moderator
A single copy of anything important is not an archive. So you will have to have multiple copies, in geographically diverse locations to protect from a localized catastrophe. You are focusing on one aspect of archive. The cloud providers can create multiple copies in geographically diverse locations.
 

USAFRet

Titan
Moderator
My backup routine includes a typical HDD in a desk drawer at work.
Refreshed 2-3 times a year.
As life conditions change, that location would change as well.


But....nothing lasts as long as printed material.
After my parents passing, going through their stuff...photo albums and paper letters from the 20's and 30's.
Pics of my grandmother partying at the Cotton Club in Harlem, etc.

Anything "digital" (if it had been possible) would have been a major pain to recreate.
If my dad had stored that on an Zip drive (1994), for instance, building up a system to read that would be a major pain or expense. And that is for me, a major geek. My adult kids today would have had no clue how to start to do that.
 

Both forms of media have problems, they're just different problems.

I'm in the process of scanning my parents' photo albums. A lot of the color photos from the 1960s-1980s have faded or color-shifted. It's creating a lot of extra work for me to try to correct them. Digital photos suffer from being easier to lose, but they don't fade or color-shift, and it's trivial to duplicate them.
 

USAFRet

Titan
Moderator


Right. I'm doing the same. And yes, different failure modes.
But even a color shifted/faded pic is still valuable and 'readable'. And instantly readable.
A slightly corrupted digital is probably trash.
 
Nov 12, 2012
25
0
10,540
I think the best answer is M-disks (for local preservation) PLUS the cloud (for distributed preservation), and maybe regular backup to a hard disk.

With regard to backups, I am reminded that fast backups, as in "smart updates", while fast, are not in the best interest of archival security. They don't write what is already there, such that what is already there may be halfway towards degrading. The right strategy is complete erase, then complete write.

But again, with regard to obsolescence (e.g floppy disks, zip disks, VHS and Beta tapes, movie film), there is always someone out there who will recover from those (for a small fee). Always. Obsolescence is both a pain in the rear AND a major business opportunity. Betcha in a hundred years there will be SOMEONE out there who can read floppy disks, if there is any surviving data on them to be read.

Thank you, all. This has been a useful discussion.
 

jamvaru

Distinguished
Jan 14, 2012
2
0
18,510
It is mostly irrelevant as data changes all the time. What we find relevant today is irrelevant tomorrow.
The only legitimate reason to have long-term storage is for posterity as a species, for our descendants to know who we are or possibly whatever species comes along after we are gone.
It might be nice to save one's digital art or creations in a permanent medium. Likely one could join a collective archival organization to optimize the efficiency and effectiveness of the task. It would be a sort of permanent cloud, like perhaps a 'nebula' or 'borg cube'. Joining the 'collective' is the next logical step in permanent data storage.