AAT Embeddings using CLIP

Prerequisites

Milvus 2.2.2

pymilvus

pip install lxml
pip install torch torchvision
pip install transformers
pip install pymilvus==2.2.2

Extract relevant data from AAT to be used to generate embeddings

Unzip the AAT terms or downlaod newer ones from the getty and extract them here
Run extract-data.py
check the output file aat_terms.csv

Generate embeddings for the output terms

Run the Script generate-embeddings.py to create a new CSV file with the embeddings.
This will result in a new file aat_terms_with_embeddings.csv

Insert the embeddings into a new Milvus collection

Run insert_embeddings.py to insert the embeddings into a new Milvus collection

My output looks like this:

ubuntu@idios:~/AAT-CLIP-embeddings$ python3 insert_embeddings.py 
Collection 'aat_CLIP' created.
Total records to process: 56830
Inserted records 1 to 1000.
Inserted records 1001 to 2000.

...
...
Inserted records 55001 to 56000.
Inserted records 56001 to 56830.
Index created on 'embedding' field.
Data flushed to disk.
Data inserted into collection 'aat_CLIP'.

Notes on generating embeddings

Runnig with an NVIDIA A10 GPU is 5-10 times faster than on my M1 mac. You can monitor the state of the GPU with:

watch -n 1 nvidia-smi

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.env_sample		.env_sample
.gitignore		.gitignore
README.md		README.md
aat_terms_extended.csv		aat_terms_extended.csv
aat_xml_0622.zip		aat_xml_0622.zip
extract-data.py		extract-data.py
generate-embeddings.py		generate-embeddings.py
insert_embeddings.py		insert_embeddings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AAT Embeddings using CLIP

Prerequisites

Extract relevant data from AAT to be used to generate embeddings

Generate embeddings for the output terms

Insert the embeddings into a new Milvus collection

Notes on generating embeddings

About

Releases

Packages

Languages

lklic/AAT-CLIP-embeddings

Folders and files

Latest commit

History

Repository files navigation

AAT Embeddings using CLIP

Prerequisites

Extract relevant data from AAT to be used to generate embeddings

Generate embeddings for the output terms

Insert the embeddings into a new Milvus collection

Notes on generating embeddings

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages