Question Deduplicate Files From Several Drives

aggielaw

Distinguished
Apr 13, 2009
35
1
18,535
Hi there forum members. Looking for your help archiving 25 years of data.

I have a lot of data from the last 25 years scattered across a dozen hard drives, two "backup" drives, and a Synology DSM723+ NAS. Long story short, the "backup drives" are my attempt at aggregating the contents of the 12 old drives, and the NAS is my attempt at aggregating and backing up the "backup" drives. The backup drives have significant overlapping data, but are not identical, and the NAS has most, but not all the files on the backup drives. I saved the 12 old, small drives since outgrowing them, and they may have data that didn't get backed up when I went to newer, larger drives.

I'm now at a point where I've nearly exceeded the 8TB drives in my NAS. It's time to clean up the NAS disks to figure how how much overlapping data I have on there, delete the duplicate files, and hopefully make room for the data on other drives that hasn't been backed up to the NAS yet.

What is your recommended solution for deduplicating files on the NAS, and what do you recommend as a solution for comparing the contents of the backup drives and the dozen drives to the deduplicated NAS drives to ensure 1) I get every file I've ever preserved on to the NAS; and 2) I have no duplicate files taking up storage on the NAS drives?

Finally, although I operate my NAS in RAID 1, I would like to have a backup in case both drives fail, are hacked/ransomed, or destroyed. What solution would you recommend for backing up the contents of the NAS?

Many thanks in advance for your advice.

hc
 
Suppose you have 13 copies of a particular picture of grandpa's 57 Chevy.

Do they all have the same name? Or 13 names?

In GB or TB, what is the sum of all files on all drives, duplicates included?

In GB or TB, what is your best estimate of the sum of what ONE copy of each file would be?......that is, same as previous question, but with zero duplicates.

Are these mostly text files? Mostly pictures? Mostly OS and software related?
 
Hi there, Lafong. Thanks for starting to steer me toward a good solution.

I'm sure there are some files that have been renamed. I'd guess no more than 200 files are in that group.

I think the sum of all files, duplicates included, is 12-14TB. Best, guess, which I only have marginal confidence in, is a complete set of files with no duplicates is about 6-7 TB range.

By type, the vast majority of files are photos, followed by audio files, then software files, and a few hundred video files.

One problem I might run into that I didn't think about before is the audio files. I have multiple copies of many songs. In many cases, I own different releases of the same album, or I have live versions of songs I also have on studio albums. I wouldn't want to so fully automate that I deleted those files that have duplicate filenames, but are not actually duplicate files.

Thanks for your advice.