Feat task 0702 find enhancements #1072

serengil · 2024-03-08T19:54:23Z

Tickets

What has been done

With this PR,

1- some improvements done in find function: detect replaced images, store column names in the pickle.
2- do not discard upper case links anymore
3- support pre-calculated embeddings in verify function

How to test

make lint && make test

- detect replaced files already in data store - store column names in the pickle

AndreaLanfranchi · 2024-03-08T20:22:50Z

deepface/commons/package_utils.py

+        hash (str): digest with sha1 algorithm
+    """
+    with open(file_path, "rb") as f:
+        digest = hashlib.sha1(f.read()).hexdigest()


Imo this is quite dangerous.
It attempts to read a whole image file in memory and eventually passes the whole content to the hashing function. In case of (maybe malicious) extremely large image files it could cause OOM (also depending on available memory).
Maybe is better to read and hash the file in chunks.

Do you recommend to do something like this: https://stackoverflow.com/a/64994148/7846405

Exactly.
As a bonus value for safety I would include a configurable filter in the dataset traverse to exclude in any case files with sizes over a certain threshold: in fact in other parts of the deepface loads in any case the whole file in memory with cv.imread()

Yeah having file threshold is wise. But will use file properties instead of file itself to hash.

AndreaLanfranchi · 2024-03-08T20:29:00Z

deepface/modules/recognition.py

+        alpha_hash = current_representation["hash"]
+        beta_hash = package_utils.find_hash_of_file(identity)
+        if alpha_hash != beta_hash:
+            logger.debug(f"Even though {identity} represented before, it's replaced later.")
+            replaced_images.append(identity)


This is a bit expensive.
You compute the hash of a file here which is eventually discarded.
Then the file with unmatched hash is added to the list of new files which are passed to _find_bulk_embeddings where the hash of the file is recomputed once again.

I suggest, instead of hashing the whole content of the file, to hash only most immediate (and lighter) properties: name, creation timestamp, last modification timestamp, and size.
Is quite unlikely that a completely different image overwrites the original one keeping the very same attributes.

Liked that approach and convinced. Will do it in another PR soon.

sanket-valani-tss · 2024-03-11T10:39:38Z

deepface/modules/verification.py

+            expand_percentage=expand_percentage,
+        )
+    except ValueError as err:
+        raise ValueError("Exception while processing img1_path") from err


serengil added 6 commits March 8, 2024 12:01

enhancement for find function

07a2d5b

- detect replaced files already in data store - store column names in the pickle

cover uppercase links

2f9f976

wrapper for find distance added

d7c2998

support embedding input for verify

6eced68

details for contribution

b3c98e3

simplify getting pickled images

259add4

serengil merged commit 644fc67 into master Mar 8, 2024
4 checks passed

AndreaLanfranchi reviewed Mar 8, 2024

View reviewed changes

serengil deleted the feat-task-0702-find-enhancements branch March 8, 2024 20:24

AndreaLanfranchi reviewed Mar 8, 2024

View reviewed changes

sanket-valani-tss mentioned this pull request Mar 11, 2024

Error message on verify function #1098

Closed

sanket-valani-tss reviewed Mar 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat task 0702 find enhancements #1072

Feat task 0702 find enhancements #1072

serengil commented Mar 8, 2024

AndreaLanfranchi Mar 8, 2024

serengil Mar 8, 2024

AndreaLanfranchi Mar 8, 2024

serengil Mar 8, 2024

AndreaLanfranchi Mar 8, 2024

AndreaLanfranchi Mar 8, 2024 •

edited

Loading

serengil Mar 8, 2024

sanket-valani-tss Mar 11, 2024

Feat task 0702 find enhancements #1072

Feat task 0702 find enhancements #1072

Conversation

serengil commented Mar 8, 2024

Tickets

What has been done

How to test

AndreaLanfranchi Mar 8, 2024

Choose a reason for hiding this comment

serengil Mar 8, 2024

Choose a reason for hiding this comment

AndreaLanfranchi Mar 8, 2024

Choose a reason for hiding this comment

serengil Mar 8, 2024

Choose a reason for hiding this comment

AndreaLanfranchi Mar 8, 2024

Choose a reason for hiding this comment

AndreaLanfranchi Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

serengil Mar 8, 2024

Choose a reason for hiding this comment

sanket-valani-tss Mar 11, 2024

Choose a reason for hiding this comment

AndreaLanfranchi Mar 8, 2024 •

edited

Loading