A while ago I went on a trip to Iceland with a couple friends. Upon returning, we created a shared Google Photos album which quickly filled up with over 2000 images. On one hand, we did make that many memories. Buuuuut, on the other hand, there were many pairs of images that looked like the following:
(If you look reaaally closely, the clouds are in slightly different spots)
This is less than ideal. Every time I wish to go through this photo album and bring back memories I'll have to flip through multiple duplicates. As such, I went on a quest to purge similar looking images from the gigantic album.
Create something (most likely a script) that can go through a directory containing the photos from this trip and isolate a subset of "unique" images.
We start simple. We can cast our problem here as a "Document Distance" problem. We create some metric d that takes in two images and returns a some number that represents the "distance" between two images. A larger distance means the two images are not alike and a smaller distance means the two images are similar. As such, we can begin grouping similar images together and pick any representative image from each group to make up the subset of unique images.
There are definitely many issues with this, but hey, it's a start. For more details, follow me
We go down the deep learning route as we find that someone on the internet has already solved our problem for us (or have they?). Specifically, Turi Create is a python library that creates a model allowing us to identify similar images.
For more details, follow me
We stay on the deep learning route, but roll our own models to solve our particular problem. This was fun, I hadn't done anything like this before, so please check it out!
We've done all the coding. Now, there's a big showdown between our ideas! Which idea will reign supreme? Click this link or this guy or even this one to find out!
The models I used didn't accept HEIF files (side note: I was annoyed to discover this after waiting for the turicreate
model build over only ~1300 of my images instead of all 2000)
My initial idea was to create a Photoshop script to bulk convert them, but unfortunately PS doesn't support the HEIF file format (unless one is on a Mac), so that was a bust.
After a bit of googling, I found Stuff Jason Does's blog post which utilized libheif to convert HEIF to JPEG and life was good again.
All you have to do is follow these 4 easy steps:
# Installation
sudo add-apt-repository ppa:strukturag/libheif
sudo apt-get install libheif-examples
sudo apt-get update
# Conversion
for f in *.heic; do heif-convert $f ${f%heic}jpg; done
Note 1: uhhhh ... 4 easy steps (provided you're on a linux-based system). :-)
Note 2: This only converts the .heic
files ... so, just repeat with .heif
, .avif
, etc. for other file types.