Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: unused union type attributes can use a lot of memory #111964

Closed
alex-spies opened this issue Aug 19, 2024 · 3 comments · Fixed by #111973
Closed

ESQL: unused union type attributes can use a lot of memory #111964

alex-spies opened this issue Aug 19, 2024 · 3 comments · Fixed by #111973
Assignees
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@alex-spies
Copy link
Contributor

Relates #107330

In simple queries like FROM logs-* | LIMIT 10, many indices and fields can be involved. In particular, fields mapped to different types in different indices are just passed through, even though we do nothing with them (their values are always null).

Multi-typed fields internally create attributes which contain an error message string with the different types they're mapped to in different indices, e.g. line 1:32: Cannot use field [some_field] due to ambiguities being mapped as [2] incompatible types: [double] in [logs-1, logs-2, logs-3], [float] in [logs-4]. (More specifically: When multi-typed fields are passed through, they get internally represented as UnsupportedAttributes, whose error message is constructed from InvalidMappedFields here.)

These messages can get long; additionally, they get serialized as part of their respective attributes and are being sent to data nodes, although we likely do not need the error string in the data nodes.

Let's reduce that overhead during planning and when serializing the plan.

@alex-spies
Copy link
Contributor Author

Ideas that we can try:

  • Limit the maximum number of indices mentioned in the error message, e.g. Cannot use field [some_field] due to ambiguities being mapped as [2] incompatible types: [double] in [logs-1, logs-2, logs-3] and [32] other indices, [float] in [logs-4, logs-5, logs-6] and [17] other indices.
  • Check if we can strip the error message before sending the UnsupportedAttributes to data nodes; the error message should be irrelevant after the validation step.
  • Maybe check if we can save some more memory by computing the error message on demand rather than storing it fully expanded; for unsupported types, UnsupportedAttribute alread has the errorMessage method, but we still eagerly instantiate the error message on construction.

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 19, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@luigidellaquila luigidellaquila self-assigned this Aug 19, 2024
@luigidellaquila
Copy link
Contributor

Unfortunately we cannot easily compute the error message afterwards, as we don't preserve the InvalidMappedField that contains all the information (and anyway, transferring the InvalidMappedField would be quite expensive as well).

Probably the best solution is the first one you proposed, ie. reduce the size of the error message.

Stripping the message completely could be an option as well; the data node does not need it. Anyway, for debug reasons and for consistency, I'd prefer to keep the deserialized versions as similar as possible to the original.

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Sep 4, 2024
When dealing with index patterns, eg. `FROM logs-*`, some fields can
have the same name but different types in different indices.

In this case we build an error message like

```
Cannot use field [multi_typed] due to ambiguities being mapped as [2] incompatible types: 
[ip] in [test1, test2], [keyword] in [test3]"
```

With this PR, in case of many indices involved, we avoid listing them
all, but we only list three of them and provide information about how
many other indices are affected, eg.

```
Cannot use field [multi_typed] due to ambiguities being mapped as [2] incompatible types: 
[ip] in [test1, test2, test3] and [2] other indices, [keyword] in [test6]
```

(see the `and [2] other indices`)

Since these error messages are stored in `UnspportedAttributes` and
serialized, this PR reduces significantly the size of a serialized
execution plan with many type conflicts.

Fixes elastic#111964

Related to elastic#111358
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this issue Sep 5, 2024
When dealing with index patterns, eg. `FROM logs-*`, some fields can
have the same name but different types in different indices.

In this case we build an error message like

```
Cannot use field [multi_typed] due to ambiguities being mapped as [2] incompatible types: 
[ip] in [test1, test2], [keyword] in [test3]"
```

With this PR, in case of many indices involved, we avoid listing them
all, but we only list three of them and provide information about how
many other indices are affected, eg.

```
Cannot use field [multi_typed] due to ambiguities being mapped as [2] incompatible types: 
[ip] in [test1, test2, test3] and [2] other indices, [keyword] in [test6]
```

(see the `and [2] other indices`)

Since these error messages are stored in `UnspportedAttributes` and
serialized, this PR reduces significantly the size of a serialized
execution plan with many type conflicts.

Fixes elastic#111964

Related to elastic#111358
luigidellaquila added a commit to luigidellaquila/elasticsearch that referenced this issue Sep 12, 2024
When dealing with index patterns, eg. `FROM logs-*`, some fields can
have the same name but different types in different indices.

In this case we build an error message like

```
Cannot use field [multi_typed] due to ambiguities being mapped as [2] incompatible types:
[ip] in [test1, test2], [keyword] in [test3]"
```

With this PR, in case of many indices involved, we avoid listing them
all, but we only list three of them and provide information about how
many other indices are affected, eg.

```
Cannot use field [multi_typed] due to ambiguities being mapped as [2] incompatible types:
[ip] in [test1, test2, test3] and [2] other indices, [keyword] in [test6]
```

(see the `and [2] other indices`)

Since these error messages are stored in `UnspportedAttributes` and
serialized, this PR reduces significantly the size of a serialized
execution plan with many type conflicts.

Fixes elastic#111964

Related to elastic#111358
elasticsearchmachine pushed a commit that referenced this issue Sep 13, 2024
…12819)

* ES|QL: shorten error messages for UnsupportedAttributes (#111973)

When dealing with index patterns, eg. `FROM logs-*`, some fields can
have the same name but different types in different indices.

In this case we build an error message like

```
Cannot use field [multi_typed] due to ambiguities being mapped as [2] incompatible types:
[ip] in [test1, test2], [keyword] in [test3]"
```

With this PR, in case of many indices involved, we avoid listing them
all, but we only list three of them and provide information about how
many other indices are affected, eg.

```
Cannot use field [multi_typed] due to ambiguities being mapped as [2] incompatible types:
[ip] in [test1, test2, test3] and [2] other indices, [keyword] in [test6]
```

(see the `and [2] other indices`)

Since these error messages are stored in `UnspportedAttributes` and
serialized, this PR reduces significantly the size of a serialized
execution plan with many type conflicts.

Fixes #111964

Related to #111358

* Spotless

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants