[TAG] JPEG de-duplication
ny at youngman.org.uk
Mon Jul 26 00:19:34 MSD 2010
A family member has a number of directories containing photos in JPEG format.
3 directories contain different versions of the same collection of photos.
One is the current master and the others are earlier snapshots of the same
collection. I believe that all the photos in the older snapshots are present
in the current master, but I would like to verify that before I delete them.
Also many other directories probably contain duplicates of photos in the
master collection and I would like to clean those up.
Identifying and cleaning up byte for byte identical JPEGs in the snapshots has
freed up a considerable amount of disk space. A sample of the remaining
photos suggests that they are probably in the master, but the tags and
position in the directory tree have changed. I don't want to go through
comparing them all by hand.
Initial research suggests that ImageMagick can produce a "signature", which is
a SHA256 checksum of the image data. I believe that this would be suitable
for identifying identical images, on which the tags have been altered.
Are there any graphics experts in the gang who can confirm this? Alternatively
suggestions of existing tools that will do the job, or better approaches,
would be most welcome.
More information about the TAG