Question Looking for way to find duplicate files/images but only if they appear in a row

Status
Not open for further replies.

Cyber_Akuma

Distinguished
Oct 5, 2002
504
21
18,985
This is pretty niche, so I am not sure if there is even any possible solution for this. I have hundreds of images files that are part of an animation sequence, the vast majority of which are the exact same frame, and I need to find the duplicates. Thing is, I need to ONLY find the ones that are duplicates in a row, not ones that get duplicated later after a different image since some of the frames of the animation repeat later.

As far as I am aware nearly all duplicate file or photo software can't take this into account, if you tell it to find duplicates in a directory it will just find all of them in regardless of order. I need it so for example if the first 1-5 files are the same, it will delete 2-5, then start with file 6 and scan for duplicated of that onward... but if say by file 100 it's a duplicate of 1 again, I do NOT want to delete it.. but delete files 101, 102, 103 etc if those are duplicates of 100 (which would also be a duplicate of 1).

Is there any software to do something like this? If it makes it easier it doesn't have to be a visual match, the duplicate files have the same hashes as well. I was thinking maybe I could just write a batch file or something to do this by their matching hashes but I would have no idea where to even start.
 
What kind of image files is this? Be aware that some image formats does include exif or other metadata that can renders has unequal even if the images are the same on a pixel by pixel comparison.

Also what OS ?

Also - are the files named in predictionally increasing decimal numbers (if you need to create a script) ?
 
  • Like
Reactions: Cyber_Akuma
Bitmaps, I used a hashing app to find out that their hashes match. And Windows.

I actually managed to solve it in this case though, other than the first 19 files, every 20 files were the same, so it was easy to weed them out. But I would still like to know how to do this for the future as well for times when it might not be as easy. And yes, the files were named in increasing numbers.
 
And I will ask what is causing or creating all of the image duplications/rows of image duplications?

Software, procedural, human error?

Determine if there are ways, if any, to prevent or limit the image duplications.

Not sure what controls may be available with respect to the animation process being used.

Just a thought.
 
And I will ask what is causing or creating all of the image duplications/rows of image duplications?

Software, procedural, human error?

Determine if there are ways, if any, to prevent or limit the image duplications.

Not sure what controls may be available with respect to the animation process being used.

Just a thought.

They are extracted frames from the animation.

Bitmaps is not "a file format" but describe a group of many different file formats - including bmp, jpg, tiff, webp and many more, while image formats like svg, emf, dxf does not describe bitmaps images.

Let be honest here, nobody is going to be referring to a PNG or JPG when they call something a "bitmap file" unless they are being very technical on the basic concepts of images being stored instead of talking about "what kind of image file" like you asked, but if you really need me to say it, they are BMP format.
 
Extracted frames....

Not a graphics artist, producer, etc. (full disclosure) so I can only wonder about and ask what, why, and how are those frames being extracted?

Overall, I would expect that there is some editorial purpose in extracting frames.

However, I would also expect that sometimes such extractions would not be needed and thus extraction per se would turned off or otherwise disabled.

= = = =

Also: It is important to know the file type if the end expectation is to identify and remove the duplicate files.

Removing duplicate files is a separate process that can be managed by any number of file management applications, utility programs, or even scripts via Powershell.

So there are two immediate paths: 1) Prevent the duplications from happening, and 2) use file management methods to find and remove the duplicates.
 
Let be honest here, nobody is going to . . . .
Ok, you've probably right. Why not simply say "bmp" file type and also what resolution and color space?

Also - when you use the term "identical" - have you checked the frames to see if there can be pixels within that is slightly different from previous frame (i.e. use image editor like Gimp, put the two images as different layers, then use "different" blend option on top layer and after that merge down and then put maximum gamma adjustment to reveal any differences) ?
 
Well, since I saw that the frames that look similar all had the same hashes I assumed they must be the exact same image, since even if a single pixel was different the hash would be completely different wouldn't it?
 
Status
Not open for further replies.