-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add search engine reindexing cli task #4404
Conversation
The URL of the deployed environment for this PR is https://argilla-quickstart-pr-4404-ki24f765kq-no.a.run.app |
|
||
async def _reindex_datasets(db: AsyncSession, search_engine: SearchEngine, progress: Progress) -> None: | ||
task = progress.add_task( | ||
f"reindexing datasets...", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can specify that only the Feedback datasets will be reindexed. Just to clarify to users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
await search_engine.delete_index(dataset) | ||
await search_engine.create_index(dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "soft" changes we can use the update mappings and the update settings endpoints:
See docs:
- https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html
If we take care of mapping changes and we do not modify the field type, we could always update mappings without deleting the index. See here
from typer.testing import CliRunner | ||
|
||
|
||
@pytest.mark.asyncio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.asyncio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #4404 +/- ##
===========================================
+ Coverage 65.96% 66.07% +0.11%
===========================================
Files 330 333 +3
Lines 19115 19188 +73
===========================================
+ Hits 12609 12679 +70
- Misses 6506 6509 +3 ☔ View full report in Codecov by Sentry. |
It would be nice to add some references in docs about this new task. I think the better place should be here |
Ok, I will add a new section in the docs about this task. |
Done. |
Description
This PR adds a new cli task to reindex datasets and records so we can use it once ElasticSearch/OpenSearch mappings are updated.
To execute the cli task use the following command:
This task will iterate over all the datasets in the database, reindexing them and doing the same for all records for each dataset.
Note
We are using server-side cursors (streams) for fetch the collection of datasets and records in the cli task. For more information take a look to SQLAlchemy documentation.
Closes #4335
Type of change
How Has This Been Tested
Checklist