The plan is to use the sentence transformers to create embeddings that we can later look up with the annoy nearest neighbor search to find relevant files. We might need to implement some sort of caching embeddings with maybe sqlite later to speed up startup.
optionally make a model directory and in the model directory git clone the embedding model and cross encoder model.
The folder structure (tree -d) output should look like models/ ├── msmarco-distilbert-base-tas-b │ └── 1_Pooling └── ms-marco-TinyBERT-L-2
Remember to "git-lfs pull" after git cloning to get the model files. The main.py automatically checks these two folders before trying to load the models from the internet
Usage:
source venv/bin/activate
First create the embeddings for the notes and the annoy tree with the following:
python3 main.py build (notes-dir) (data directory name)
ex. python3 main.py build ~/Notes notes
Do semantic search on the notes with the following:
python3 main.py search "(query string) (data directory name)"
ex. python3 main.py search "ssh" notes