-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use image retrieval techniques to find similiar images #27
Comments
#66 should be implemented first. |
Idea for the UI: If this feature is active (which is optional or disabled if not enough training data is available), the grid of image patches in MAIA is split vertically (e.g. 80% rows showing the regular patches, 20% rows showing patches suggested by this method). This way the original MAIA workflow is still possible even if this method performs poorly for a given use case. |
This can be done with the image features and similarity search implemented for biigle/core#336. The function should be available for training proposals and annotation candidates. |
Next idea for the UI: The selected proposal/candidate is shown, fixed and highlighted at the first position in the grid. The remaining grid items are sorted according to the similarity to the patch. They scroll and can be interacted with as usual. The filtering can be enabled with a hover button on each patch. It can be disabled with a button on the highlighted fixed patch. |
Updated the title to make clear that this should be implemented both for training proposals and annotation candidates. |
With the student experiments based on Dino features and #96 done, this can move forward now. |
I want to pick this up again. New thoughts:
Here is a notebook with a minimal feature-extraction example with DINOv2: https://colab.research.google.com/drive/1LbtYkzdOezl2SadyxCRJFYhLd_aQNjlq?usp=sharing |
Thinking about it, maybe I prefer decoupling the vector database from our main database. With MAIA and Largo it's easy to implement cleanup of vector database rows, since the annotation/candidate/proposal patch files are also cleaned. Cleanup can be asynchronous as well. This has the advantage that the vector DB does not have an impact on the regular DB backups. It can have it's own (less frequent) backups and be run on a different host. Laravel can work with different database connections (also for migrations). We only need to sync (and index) the model IDs from the regular DB to the vector DB but this shouldn't be a problem. I'll still stick with pgvector, as I don't want to introduce a new technology to the stack. |
References #27 References biigle/core#670
More like a nice to have.
I just browsed through the results of novelty detection. Unfortunately the classes are quite scattered, so that selection takes some time. In addition, some classes are much more abundant than others, so the rare classes might be "lost" in the downstream steps. It would be nice to have a "show me more thumbnails that look like this one" mechanism. Algorithms for that are available in image retrieval. We could for example use mpeg7 features or something similar to create a tree structure from the data to make it easier browsable. Creation of that structure shouldn't take much time or resources.
The text was updated successfully, but these errors were encountered: