Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autogenerate aggregation responses and hits #1932

Merged

Conversation

miguelgrinberg
Copy link
Collaborator

No description provided.

@miguelgrinberg miguelgrinberg force-pushed the autogenerate-aggregations-and-hit branch 4 times, most recently from b6f3fe4 to b218090 Compare October 22, 2024 22:42
@miguelgrinberg miguelgrinberg marked this pull request as ready for review October 22, 2024 22:48
@miguelgrinberg miguelgrinberg force-pushed the autogenerate-aggregations-and-hit branch from b218090 to 9777a93 Compare October 22, 2024 22:50


class HitMeta(HitMetaBase):
inner_hits: Mapping[str, Any]
Copy link
Collaborator Author

@miguelgrinberg miguelgrinberg Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found a way to type inner hints appropriately without making breaking changes. I think if we have the intention to have proper typing we'll have to evaluate a change here.

For example, consider:

hit.meta.inner_hits["passages"][0].content

Here it is impossible to type the content attribute of the first Passage document, since inner_hits is a dictionary, and each entry in this dict maps to a different document type.

We may consider changing to something like:

hit.passages.meta.inner_hits[0].content

I think this may be possible to type because the inner hits belong to the nested document, and hit.passages already has the correct type. The problem is that hit.passages is a list, so adding attributes to the list feels weird.

In any case, this feels like a bigger change that cannot be rushed.

@@ -30,7 +31,7 @@ async def scan_aggs(
source_aggs: Sequence[Mapping[str, Agg]],
inner_aggs: Dict[str, Agg] = {},
size: int = 10,
) -> AsyncIterator[Response]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response here was plainly wrong, not sure how this passed mypy to be honest!

@@ -161,8 +159,6 @@ async def add_answer(
answer = Answer(
# required make sure the answer is stored in the same shard
_routing=self.meta.id,
# since we don't have explicit index, ensure same index as self
_index=self.meta.index,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't necessary on this example, since all the documents belong to the same index. I removed it because it creates a typing problem, because _index is stored as an Index or AsyncIndex, but here it was given as a string.

@@ -625,6 +638,9 @@ def interface_to_python_class(
"required": bool,
"positional": bool,
],
"buckets_as_dict": "type" # optional, only present in aggregation response
# classes that have buckets that can have a list
# or dict representation
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make typing easier aggregations have buckets typed as a list and buckets_as_dict typed as a dict. You can use whichever makes sense depending on how yo request your aggregations. It's a compromise solution to avoid having to apply casts on a type that is a union of list and dict buckets.

Copy link
Member

@pquentin pquentin Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "type" here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"type" here is the type that the buckets_as_dict property will be. This is used on the types Jinja template:

    @property
    def buckets_as_dict(self) -> Mapping[str, {{ k.buckets_as_dict }}]:
        return self.buckets  # type: ignore

@miguelgrinberg miguelgrinberg marked this pull request as draft October 23, 2024 08:54
@miguelgrinberg miguelgrinberg marked this pull request as ready for review November 4, 2024 10:20
yield b
if "after_key" in response.aggregations.comp:
after = response.aggregations.comp.after_key
while response.aggregations["comp"].buckets:
Copy link
Member

@pquentin pquentin Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching from .comp to ["comp"] feels like a net negative, but if that helps mypy type correctly (and raise an error on a typo) then that's OK!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree. To me it feels like using the dict syntax makes it clear that this is a dynamic data item that is application specific. But maybe I convinced myself this is better.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.

@miguelgrinberg miguelgrinberg merged commit 601bea7 into elastic:main Nov 6, 2024
16 checks passed
@miguelgrinberg miguelgrinberg added the backport 8.x Backport to 8.x label Nov 6, 2024
github-actions bot pushed a commit that referenced this pull request Nov 6, 2024
* typing of aggregation responses

* buckets_as_dict property

* typing of the `meta` attribute of hits

* more minor adjustments to response types

(cherry picked from commit 601bea7)
miguelgrinberg added a commit that referenced this pull request Nov 6, 2024
* typing of aggregation responses

* buckets_as_dict property

* typing of the `meta` attribute of hits

* more minor adjustments to response types

(cherry picked from commit 601bea7)

Co-authored-by: Miguel Grinberg <miguel.grinberg@gmail.com>
miguelgrinberg added a commit to miguelgrinberg/elasticsearch-dsl-py that referenced this pull request Dec 9, 2024
* typing of aggregation responses

* buckets_as_dict property

* typing of the `meta` attribute of hits

* more minor adjustments to response types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 8.x Backport to 8.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants