Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: index records with suggestions for search engine #4317

Conversation

frascuchon
Copy link
Member

@frascuchon frascuchon commented Nov 24, 2023

Description

This PR adds support for indexing suggestions when indexing records in the search index.

In order to be sure that all record attributes are passed when indexing records, a workaround has been implemented by forcing a record, responses, and suggestions to refresh before indexing them. Otherwise is hard to add that info when loading records in the current code base.

This behavior must be reviewed and simplified cc @gabrielmbmb @jfcalvo

Changes related to tests will be moved into a separate PR since extra refactoring work must be done. Once this extra PR is merged here, the PR will be marked as ready for review. Related PR #4318

Refs: #3849

Closes #4230

Also, simplify and remove all extra mapper componets and reuse some code blocks
regarding the elastic field definitions
@frascuchon frascuchon self-assigned this Nov 24, 2023
@frascuchon frascuchon changed the title Feat/index records with suggestions for search engine feat: index records with suggestions for search engine Nov 24, 2023
@frascuchon frascuchon force-pushed the feat/index-records-with-suggestions-for-search-engine branch from 63f9ced to cf58a17 Compare November 24, 2023 14:01
src/argilla/server/contexts/datasets.py Outdated Show resolved Hide resolved
Comment on lines 622 to 623
await refresh_records(records)
await search_engine.index_records(dataset, records)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's the best approach but to explicitly link the preload of the records with the search engine maybe we can do the following:

Suggested change
await refresh_records(records)
await search_engine.index_records(dataset, records)
await search_engine.index_records(
dataset,
await _preload_records_associations(records),
)

Or if you don't want to do that:

Suggested change
await refresh_records(records)
await search_engine.index_records(dataset, records)
await _preload_records_associations(records)
await search_engine.index_records(dataset, records)

Comment on lines 883 to 884
await refresh_records(records)
await search_engine.index_records(dataset, records)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said before we can do:

Suggested change
await refresh_records(records)
await search_engine.index_records(dataset, records)
await search_engine.index_records(
dataset,
await _preload_records_associations(records)
)

If you don't want to do that in any case you can change it to:

Suggested change
await refresh_records(records)
await search_engine.index_records(dataset, records)
await _preload_records_associations(records)
await search_engine.index_records(dataset, records)

Comment on lines 923 to 924
await refresh_records([record])
await search_engine.index_records(record.dataset, [record])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said before:

Suggested change
await refresh_records([record])
await search_engine.index_records(record.dataset, [record])
await search_engine.index_records(
record.dataset,
[await _preload_record_associations(record)]
)

Or:

Suggested change
await refresh_records([record])
await search_engine.index_records(record.dataset, [record])
await _preload_record_associations(record)
await search_engine.index_records(record.dataset, [record])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first approach, that new function should return the loaded record list. We can review it together

UserResponseStatusFilter,
)

ALL_RESPONSES_STATUSES_FIELD = "all_responses_statuses"


class SearchDocumentGetter(GetterDict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loving these code deletions.

raise Exception(f"Index configuration for metadata property of type {property_type} cannot be generated")


def es_mapping_for_question_type(question_type: QuestionType):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something (tell me if I'm wrong) but should not be this function better named es_mapping_for_question and receive a question instead of the type?

Suggested change
def es_mapping_for_question_type(question_type: QuestionType):
def es_mapping_for_question(question: Question):

Inside you can do the if question.type conditional.

frascuchon and others added 7 commits November 27, 2023 06:54
This PR changes and unifies tests related to the search engine.

- All tests which will return the same results for the different engine
implementations are placed in `test_commons.py`
- Specific tests for elasticsearch implementation are placed in
`test_elasticsearch.py`
- Specific tests for opensearch implementation are placed in
`test_elasticsearch.py`

Refs: #4318

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: José Francisco Calvo <jose@argilla.io>
…of github.com:argilla-io/argilla into feat/index-records-with-suggestions-for-search-engine
…of github.com:argilla-io/argilla into feat/index-records-with-suggestions-for-search-engine
@frascuchon frascuchon marked this pull request as ready for review November 27, 2023 06:03
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. area: server Indicates that an issue or pull request is related to the server language: python Pull requests or issues that update Python code team: backend Indicates that the issue or pull request is owned by the backend team type: enhancement Indicates new feature requests labels Nov 27, 2023
Copy link

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-4317-ki24f765kq-no.a.run.app

await record.awaitable_attrs.vectors

for response in record.responses:
await response.awiatable_attrs.user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
await response.awiatable_attrs.user
await response.awaitable_attrs.user

Comment on lines 805 to 806
for suggestion in record.suggestions:
await suggestion.awaitable_attrs.question
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that here we could do await record.dataset.awaitable_attrs.questions

@@ -500,6 +469,64 @@ def _build_response_status_filter(status_filter: UserResponseStatusFilter) -> Di
def _inverse_vector(self, vector_value: List[float]) -> List[float]:
return [vector_value[i] * -1 for i in range(0, len(vector_value))]

def _map_record_to_es_document(self, record: Record) -> Dict[str, Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌🏻

await _preload_record_associations(record)


async def _preload_record_associations(record: Record) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore my comments, if at the end we're going to query the data explicitly

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 27, 2023
@jfcalvo jfcalvo merged commit 75ccee4 into feature/responses-and-suggestion-filter Nov 27, 2023
@jfcalvo jfcalvo deleted the feat/index-records-with-suggestions-for-search-engine branch November 27, 2023 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: server Indicates that an issue or pull request is related to the server language: python Pull requests or issues that update Python code lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files. team: backend Indicates that the issue or pull request is owned by the backend team type: enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Index records in elasticsearch index with suggestion info (agent, value , and score)
3 participants