-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: index records with suggestions for search engine #4317
feat: index records with suggestions for search engine #4317
Conversation
Also, simplify and remove all extra mapper componets and reuse some code blocks regarding the elastic field definitions
for more information, see https://pre-commit.ci
63f9ced
to
cf58a17
Compare
await refresh_records(records) | ||
await search_engine.index_records(dataset, records) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's the best approach but to explicitly link the preload of the records with the search engine maybe we can do the following:
await refresh_records(records) | |
await search_engine.index_records(dataset, records) | |
await search_engine.index_records( | |
dataset, | |
await _preload_records_associations(records), | |
) |
Or if you don't want to do that:
await refresh_records(records) | |
await search_engine.index_records(dataset, records) | |
await _preload_records_associations(records) | |
await search_engine.index_records(dataset, records) |
await refresh_records(records) | ||
await search_engine.index_records(dataset, records) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said before we can do:
await refresh_records(records) | |
await search_engine.index_records(dataset, records) | |
await search_engine.index_records( | |
dataset, | |
await _preload_records_associations(records) | |
) |
If you don't want to do that in any case you can change it to:
await refresh_records(records) | |
await search_engine.index_records(dataset, records) | |
await _preload_records_associations(records) | |
await search_engine.index_records(dataset, records) |
await refresh_records([record]) | ||
await search_engine.index_records(record.dataset, [record]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said before:
await refresh_records([record]) | |
await search_engine.index_records(record.dataset, [record]) | |
await search_engine.index_records( | |
record.dataset, | |
[await _preload_record_associations(record)] | |
) |
Or:
await refresh_records([record]) | |
await search_engine.index_records(record.dataset, [record]) | |
await _preload_record_associations(record) | |
await search_engine.index_records(record.dataset, [record]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first approach, that new function should return the loaded record list. We can review it together
UserResponseStatusFilter, | ||
) | ||
|
||
ALL_RESPONSES_STATUSES_FIELD = "all_responses_statuses" | ||
|
||
|
||
class SearchDocumentGetter(GetterDict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loving these code deletions.
raise Exception(f"Index configuration for metadata property of type {property_type} cannot be generated") | ||
|
||
|
||
def es_mapping_for_question_type(question_type: QuestionType): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something (tell me if I'm wrong) but should not be this function better named es_mapping_for_question
and receive a question instead of the type?
def es_mapping_for_question_type(question_type: QuestionType): | |
def es_mapping_for_question(question: Question): |
Inside you can do the if question.type
conditional.
This PR changes and unifies tests related to the search engine. - All tests which will return the same results for the different engine implementations are placed in `test_commons.py` - Specific tests for elasticsearch implementation are placed in `test_elasticsearch.py` - Specific tests for opensearch implementation are placed in `test_elasticsearch.py` Refs: #4318 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: José Francisco Calvo <jose@argilla.io>
…of github.com:argilla-io/argilla into feat/index-records-with-suggestions-for-search-engine
for more information, see https://pre-commit.ci
…of github.com:argilla-io/argilla into feat/index-records-with-suggestions-for-search-engine
The URL of the deployed environment for this PR is https://argilla-quickstart-pr-4317-ki24f765kq-no.a.run.app |
await record.awaitable_attrs.vectors | ||
|
||
for response in record.responses: | ||
await response.awiatable_attrs.user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
await response.awiatable_attrs.user | |
await response.awaitable_attrs.user |
for suggestion in record.suggestions: | ||
await suggestion.awaitable_attrs.question |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that here we could do await record.dataset.awaitable_attrs.questions
@@ -500,6 +469,64 @@ def _build_response_status_filter(status_filter: UserResponseStatusFilter) -> Di | |||
def _inverse_vector(self, vector_value: List[float]) -> List[float]: | |||
return [vector_value[i] * -1 for i in range(0, len(vector_value))] | |||
|
|||
def _map_record_to_es_document(self, record: Record) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌🏻
await _preload_record_associations(record) | ||
|
||
|
||
async def _preload_record_associations(record: Record) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignore my comments, if at the end we're going to query the data explicitly
… to only do one query per record
Description
This PR adds support for indexing suggestions when indexing records in the search index.
In order to be sure that all record attributes are passed when indexing records, a workaround has been implemented by forcing a record, responses, and suggestions to refresh before indexing them. Otherwise is hard to add that info when loading records in the current code base.
This behavior must be reviewed and simplified cc @gabrielmbmb @jfcalvo
Changes related to tests will be moved into a separate PR since extra refactoring work must be done. Once this extra PR is merged here, the PR will be marked as ready for review. Related PR #4318
Refs: #3849
Closes #4230