Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saved objects point in time finder: maximum total response size circuit breaker #168244

Open
rudolf opened this issue Oct 6, 2023 · 2 comments
Open
Labels
discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@rudolf
Copy link
Contributor

rudolf commented Oct 6, 2023

With the introduction of the point in time finder after #93770 and #92981 the following problematic pattern has become fairly common

await (for result of finder.find()) {
  results.push(result);
}

Whenever we load all results in memory this adds a lot of memory pressure and worst case can lead to OOMs.

Our developer docs specifically call out that this should not be used in a route handler unless the results are streamed https://github.com/elastic/kibana/blob/main/packages/core/saved-objects/core-saved-objects-api-server/src/saved_objects_client.ts#L340-L346 but I think we should have been much more clear why we have this warning and provide a proper example.

Before plugins used savedObjects#find which would create one request to Elasticsearch. If ever an Elasticsearch response exceeds 536MB Kibana will refuse to deserialise it. So even with a really large page size the saved objects client would not consume more heap than what's necessary to parse a 536MB response (still quite a lot).

Proposed circuit breaker

In almost all cases, loading, decompressing and deserialising more than 500mb of response data is undesirable, even if this happens in several pages. We should rather run aggregations in Elasticsearch, or load a small set of fields to reduce the response payload size.

We should add a circuit breaker to createPointInTimeFinder to stop paging after it collected 536MB (or 256MB?) of response pages. Plugins that need to load more data and know that their code is memory safe, can decide to increase the circuit breaker.

@rudolf rudolf added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc labels Oct 6, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@pgayvallet
Copy link
Contributor

We should add a circuit breaker to createPointInTimeFinder to stop paging after it collected 536MB (or 256MB?) of response pages

Using a config setting for this default would allow to have an easy escape hatch for customers if needed.

We could then set it to 512mb for traditional and 256mb for serverless, maybe?

Plugins that need to load more data and know that their code is memory safe, can decide to increase the circuit breaker

I wonder, do we want to allow them to increase the memory limit, or to simply disable the circuit breaker?

If we follow the logic that they know their code is "memory safe", then I would assume the best approach, in term of DX, is to simply being able to disable the circuit breaker logic via an option, instead of having to pass an arbitrary memory limit, wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

3 participants