-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Telemetry] The caching mechanism also caches failed payloads #123021
Comments
Pinging @elastic/kibana-core (Team:Core) |
We are caching usage for 4 hours so any failed collectors will not be reported until the next caching cycle. This is not a bug but by how we've implemented the caching logic which worked at the top level of the service.
I'm also +1 on this approach. Since we already sending 1 daily report the caching is only useful now for retry logic and such hence disabling it will resurface some issues there so I don't think we should disable it. |
This means metricbeat could still add a lot of usage collection load. Does metricbeat call this API when you enable monitoring for a Kibana cluster, or do users need to specifically configure this? How often will metricbeat hit this API? We should ensure that a common feature like monitoring a Kibana cluster doesn't end up adding a lot of additional load. |
@rudolf metricbeat uses a different API (stats API) which is not cached. The caching mechanism we introduced is solely for our telemetry. We'd need to coordinate with folks consuming this API before we introduce any caching mechanisms to the collectors there (CC @seanstory @yakhinvadim). Adding a caching layer on the stats API might be awkward since users might be expecting more 'real-time' data rather than flat graphs that change on every caching cycle. We had a discussion last year about dropping the collectors from the stats API completely which might be worth re-exploring here. |
@rudolf Metricbeat should not collect usage anymore starting Prior to that version, it should collect it only once every 24h. So, even if the user runs a previous version of Metricbeat, it shouldn't cause too much of an issue (or we can suggest them to upgrade their Metricbeat agent).
|
@yakhinvadim and I aren't actually making use of this. Enterprise Search was (but is no longer) using the |
The caching mechanism introduced in #117084 caches the full report. This means that if a collector fails during that report generation, the incomplete report will be cached.
This is important because of the scenario detailed by @jportner in this comment #120422 (comment). It could result in a user with limited access could cache an incomplete report, and another user with the right permissions requesting the report would get the cached incomplete version (and vice-versa).
Potential solutions:
kibana_system
, so it shouldn't have permissions issues.I'd say option 3 is the best compromise for now.
The text was updated successfully, but these errors were encountered: