[TAG] JPEG de-duplication
Ben Okopnik
ben at linuxgazette.net
Thu Jul 29 07:52:05 MSD 2010
On Thu, Jul 29, 2010 at 08:20:21AM +0530, Kapil Hari Paranjape wrote:
>
> What _should_ be possible (but slow!) is to write something that uses
> the magick library to convert the image into a standard bitmap (like
> ppmraw) and _then_ match signatures (or just do a bit-by-bit
> comparison). This would work fine for loss-less compression like png
> but will not be so great for lossy formats like jpeg. Moreover, there
> would be problems of comparison between vector and bitmap formats
> since the conversion to bitmap would be lossy in the former case.
Actually, for the real-world case of comparing camera-produced images, I
think we can reject any that aren't in the same format (that would be a
much more complex task, I agree.) If we're just trying to eliminate
actual copies, then that would be pretty simple:
1st pass: use unique file sizes as keys, lists of files with that size
as values
2nd pass: any lists with 2 or more files get checked for format and camera
make/model equivalence
(optional) 3rd pass: any lists that still have 2 or more entries get
checked for signature equivalence.
The actual solution is left to the student. :)
--
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
More information about the TAG
mailing list