Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pydantic v2 Elasticsearch #34

Merged
merged 5 commits into from
Jul 17, 2023
Merged

Pydantic v2 Elasticsearch #34

merged 5 commits into from
Jul 17, 2023

Conversation

prrao87
Copy link
Owner

@prrao87 prrao87 commented Jul 17, 2023

Updates

Updated Elasticsearch API to use Pydantic v2. In a similar way to Meilisearch, we are able to attach a running event loop to a multi-process pool of CPUs that can perform Pydantic validation and bulk indexing concurrently on multiple batches. The Elasticsearch async client allows us to efficiently do this in a non-blocking manner. The timing numbers on an M2 macbook pro are shown below.

$ cd dbs/elasticsearch/scripts
$ time python bulk_index.py
Found index wines in db, skipping index creation...

Processing chunks
Processed ids in range 1-10000
Processed ids in range 10001-20000
Processed ids in range 50001-60000
Processed ids in range 60001-70000
Processed ids in range 30001-40000
Processed ids in range 20001-30000
Processed ids in range 40001-50000
Processed ids in range 90001-100000
Processed ids in range 70001-80000
Processed ids in range 80001-90000
Processed ids in range 120001-129971
Processed ids in range 100001-110000
Processed ids in range 110001-120000
Finished execution!
python bulk_index.py  7.03s user 0.82s system 115% cpu 6.787 total

@prrao87 prrao87 merged commit 85ab928 into main Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants