Question Duplicate file finder recommendations ?

Status
Not open for further replies.

elsmandino

Distinguished
Jul 16, 2009
50
2
18,530
Hi there.

Can anyone recommend any free programs (Windows or Linux) that can find and delete files that are exactly identical by content (i.e. byte by byte), please?

I have the situation where by I have had to recover a load of files from a wiped hard drive with PhotoRec - the files have all lost their file names so I need to eliminate all exact duplicates before working out what to do with the rest.

Thanks very much.
 

I have heard good things about the above app, but have not used it. I think there are free and paid versions.

My dupe finder program personal experience is limited to finding duplicate pictures only, so can't help with your specific problem.

I don't envy you....that sounds like a terrible problem to have.
 
  • Like
Reactions: elsmandino

Math Geek

Titan
Ambassador
Not sure what program will do it but the answer lies in hashes of the files. Exact copies will have the same hash created. Create hashes for all the files, Group by hash and delete extras.

There's bound to be a program that can do that automatically. I'm not the first person to think it up for sure :)

Renaming is a whole other thing. Organizing them then naming thousands of files is never fun. I had to do it with about 20k mp3 files once. Took much long time
 
  • Like
Reactions: elsmandino

punkncat

Polypheme
Ambassador
I have had terrible luck over the years with "apps" of that type. I HIGHLY suggest you make sure you have a good and secure backup of all of the files as a whole (somewhere else) before starting.

There are various commands in RoboCopy that are (supposed to) do this as you move files. My results have varied greatly.
 
  • Like
Reactions: elsmandino

elsmandino

Distinguished
Jul 16, 2009
50
2
18,530
Thanks guys - yep, a real pain and a reminder to have a decent backup system in place!

Out of interest, is a file hash totally unique to that file or is it possible to have different files that just happen to have the same hash?
 

Math Geek

Titan
Ambassador
Hashes are totally unique. If you change 1 bit of the file you get a completely new hash.

This is how your password gets transmitted over the web. It hashes what you type and compares it to the stored hash. It's also how law enforcement examines hdds for illegal content. They have huge databases of known files and compare hashes of what's on the drive. And so on and so on
 
  • Like
Reactions: elsmandino

elsmandino

Distinguished
Jul 16, 2009
50
2
18,530
Thanks Math Geek.

One other quick thing - if I have two identical files but they have different names, will they have the same hash?

I.e. does filename count towards a file's hash?
 

Math Geek

Titan
Ambassador
No filename does not factor into the hash. It's totally dependent on file itself.

Try it out for fun. Take a .doc file, hash it, then add a space to the end of the text. Save and rehash. Totally different hash. Or edit 1 pixel out of a pic and so on. It it rather amazing how random the hash is each time
 
  • Like
Reactions: elsmandino

elsmandino

Distinguished
Jul 16, 2009
50
2
18,530
Hi there.

Further to the above, I am giving Dupeguru a go.

As per my original post, I have just finished recovering masses of files from a 14TB hard drive that I accidentally wiped. All the files are spread over 4 smaller hard drives.

As recommended, I am using Dupeguru to get rid of all the duplicates but am a bit unsure as to what options to use.

I have set it to scan standard content but there are a number of extra options that I am not sure of:

* Use regular expressions when filtering
* Partially hash files bigger than X MB
* Ignore duplicates hardlinking to the same file

All three of these options are unticked by default. What do each of these options mean and is there any reason at all to enable them?
 
Status
Not open for further replies.