- Terraform
- Ansible (Linux only, use WSL on Windows)
- pymilvus
- Install Terraform
- Install Ansible
- Install pymilvus
- Create new venv
python == 3.9.13
- On Windows: Install Visual Studio Build Tools
- On Windows: Using Visual Studio Build Tools install: Windows 10 SDK, Windows Universl CRT SDK, MSVC v140 - VS 2015 C++ build tools, NuGet target and build tasks, .NET Framework 4.6 targeting pack, Visual Studio SDK Build Tools Core
- Install pymilvus with pip3 (If you think your python version will do and you have C++ build tools installed, try this step first)
- Create new venv
- Navigate to terraform folder
- Configure Terraform Variables
- Provide your credentials as
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables - Configure settings variables in
terraform.tfvars
to your liking (you can find description of variables invariables.tf
)
- Provide your credentials as
- Run
terraform init
- download AWS API - Run
terraform apply
- create AWS resources - Write down Terraform output -
proxy_ips_private, proxy_ips_public, server_ips_public
- Navigate to ansible folder
- Configure
inventory.ini
ansible_host
= public ip's for next variables:proxy, main-1, main-2
- Provide private ip to all variables in
proxy containers
section
- Run
ansible-playbook ping.yml
- check if hosts are available - Run
ansible-playbook deploy-docker.yml
- install docker on hosts - Run
ansible-playbook deploy-milvus.yml
- download and start docker containers
- Navigate to python folder
- Configure
settings.json
proxy_ip
=proxy_ips_public
- Leave the rest untouched, already configured for 10 million
- Activate venv created for pymilvus
- Run
rewrite-pkl.py
- process dicts with non normalized vectors- Reads
data/query_embeddings.pkl
, writes normalised vectors as list todata/query_embeddings_list.pkl
- Reads
data/article_id_to_emb_dict_*.pkl
, writesdata/entries/article_vector_list_*_part_i.pkl
- Reads
- Run
connect-remote.py
- connect to Milvus proxy and show existing collections - Run
create-collection.py
- create collection configured insettings.json
. Takes ~ 120 minutes to upload 10 million vectors and 5 more hours to create index - Run
load-collection.py
- load collection to RAM - Run
query.py
- load vectors fromdata/query_embeddings_list.pkl
and perform search.
- Navigate to Terraform folder
- Change
terraform.tfvars
- Run
terraform apply
- Run
terraform refresh
- Write down Terraform output -
proxy_ips_private, proxy_ips_public, server_ips_public
- Change
- Navigate to Ancible folder
- Configure
inventory.ini
- Run
ansible-playbook deploy-milvus.yml
- Configure
- Navigate to python folder
- Configure
settings.json
- Run
load-collection.py
- Configure
- Navigate to python folder
- Prepare data as nested list:
[[articles_id], [vectors]]
and store in.pkl
file - Configure variable
filenames
inupdate-collection.py
to point at.pkl
file - Run
update-collection.py