Cache usage data to prevent expensive collection queries #117084

rudolf · 2021-11-02T10:07:17Z

Every browser session will request usage data every 24 hours. In large clusters collecting usage data leads to expensive Elasticsearch queries and large response payloads which affect the performance of both Elasticsearch and Kibana #93770

When Elasticsearch / Kibana performance is degraded these expensive queries take longer to complete causing timeouts and then each browser retries the request #115221.

Instead of collecting this usage data for every browser session requesting it, we should cache this data server side.

elasticmachine · 2021-11-02T10:07:19Z

Pinging @elastic/kibana-core (Team:Core)

rudolf · 2021-11-02T10:10:18Z

As a first step, we could use an in-memory only cache, meaning if there's more than one Kibana server each server will repeat the usage collection.

It might be worth creating a usage collection task that runs every X hours, and stores its results in a document with a last updated value. Kibana could then just read whatever is in the document. But this document would be quite large (I've seen > 37MB) so we would have to exclude it from e.g. saved object migrations.

rudolf · 2021-11-02T14:55:50Z

Beats might be using the API so check the impact it might have on that integration.

pgayvallet · 2021-11-03T09:37:53Z

As a first step, we could use an in-memory only cache, meaning if there's more than one Kibana server each server will repeat the usage collection.

Seems good enough for an initial implementation.

Every browser session will request usage data every 24 hours

I'm sorry, I tried to find where this was done in the code, but couldn't find it. Is this performed at a fixed time (e.g same time every day for each browser session), or is the browser sending the data 24h after their initial load? Asking because depending on this, caching the data on the server-side may be harder, or at least we need a lower TTL for the cache.

rudolf · 2021-11-04T10:56:46Z

https://github.com/elastic/kibana/blob/master/src/plugins/telemetry/public/services/telemetry_sender.ts#L86

We try to send usage data and if successful store the timestamp in localstorage. Every minute that the browser is open we check if 24 hours has expired since the last success timestamp.

Given the urgency simple server-side caching is a good first step but I think we can do more here like ask browsers to report to the server if they successfully sent a payload. If one browser session sent usage data we don't need the other browsers to also send it. So once every 24hours the browser could ask the server "has anyone sent telemetry?" if the answer is no the browser tries, sometimes multiple browsers will do so in parallel but that's fine, at least not all e.g. 100 users of this cluster are trying to send the same payload.

rudolf · 2021-11-04T11:07:50Z

Given that the browser retries every 60s and that usage collection could theoretically take longer than 60s to complete, we should also consider what will happen if the cache isn't primed. In such a case we would want to start usage collection only once regardless of how many browsers are requesting it.

marxello · 2021-11-04T13:55:25Z

As a first step, we could use an in-memory only cache, meaning if there's more than one Kibana server each server will repeat the usage collection.

It might be worth creating a usage collection task that runs every X hours, and stores its results in a document with a last updated value. Kibana could then just read whatever is in the document. But this document would be quite large (I've seen > 37MB) so we would have to exclude it from e.g. saved object migrations.

Hi Rudolf, I highly recommend making the cache refresh time configurable so that everyone can adjust its value according to their environment

rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Telemetry labels Nov 2, 2021

rudolf changed the title ~~Usage data is collected anew for every browser session request~~ Cache usage data to prevent expensive collection queries Nov 2, 2021

rudolf mentioned this issue Nov 4, 2021

Usage collection has a high Kibana/ES performance cost for some clusters #117489

Closed

lizozom added the performance label Nov 10, 2021

This was referenced Nov 22, 2021

[Usage Collection] add caching layer for stats #119312

Merged

[Meta][Telemetry] Reduce telemetry footprint #119466

Closed

Bamieh closed this as completed in #119312 Dec 15, 2021

This was referenced Jan 14, 2022

Bad requests from spaces saved object client during kibana startup #120422

Closed

[Telemetry] The caching mechanism also caches failed payloads #123021

Closed

APM data shows telemetry being called ~30 times a day #123144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache usage data to prevent expensive collection queries #117084

Cache usage data to prevent expensive collection queries #117084

rudolf commented Nov 2, 2021

elasticmachine commented Nov 2, 2021

rudolf commented Nov 2, 2021

rudolf commented Nov 2, 2021

pgayvallet commented Nov 3, 2021

rudolf commented Nov 4, 2021

rudolf commented Nov 4, 2021

marxello commented Nov 4, 2021

Cache usage data to prevent expensive collection queries #117084

Cache usage data to prevent expensive collection queries #117084

Comments

rudolf commented Nov 2, 2021

elasticmachine commented Nov 2, 2021

rudolf commented Nov 2, 2021

rudolf commented Nov 2, 2021

pgayvallet commented Nov 3, 2021

rudolf commented Nov 4, 2021

rudolf commented Nov 4, 2021

marxello commented Nov 4, 2021