Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting] Ensure reports:monitor task is never "unrecognized" #118477

Closed
tsullivan opened this issue Nov 12, 2021 · 2 comments
Closed

[Reporting] Ensure reports:monitor task is never "unrecognized" #118477

tsullivan opened this issue Nov 12, 2021 · 2 comments
Labels
bug Fixes for quality problems that affect the customer experience impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:small Small Level of Effort

Comments

@tsullivan
Copy link
Member

tsullivan commented Nov 12, 2021

Kibana version: 7.14+

When the reporting queue system was replaced with ESQueue to use Task Manager, a reports:monitor task was created as a way to add self-healing when Kibana crashes or restarts in the process of executing a report.

If this happens, the status of the report would stay in "processing" indefinitely if it weren't for the report:execute task which looks for these kinds of stuck jobs.

Problem: If the report:execute task returns an error, Task Manager reschedules it with a time buffer. These buffers can become very large. When this happens, Reporting is not able to self-heal by rescheduling stuck jobs.

Problem: the reports:monitor task is vulnerable to Kibana instances joining the cluster with the Reporting plugin disabled.

Having a mismatch of plugins enabled in different Kibana instances is not supported: it causes problems especially for plugins that are responsible for server background tasks.

When this scenario happens, the instance with Reporting disabled will not recognize the reports:monitor task. Internal logic in Task Manager updates the status of the task to unrecognized, which dictates that this task should stop running.

Workaround
Make sure that in all of the instances, the same plugins are enabled - Reporting should be enabled on every instance. If you need to ensure that only one instance is able to claim jobs, set xpack.reporting.queue.pollEnabled: false in the configuration of all instances except that one. You can disable the Reporting UI features on a per-user and per-space basis by using application feature controls. See: https://www.elastic.co/guide/en/kibana/current/secure-reporting.html#grant-user-access

Requirements:

  1. The Reporting plugin should handle xpack.reporting.enabled: false with minimal custom logic to register the reports:monitor task.
  2. The Diagnostic tool should warn the user if there is any problems with the state of the reports:monitor task.
@tsullivan tsullivan added the bug Fixes for quality problems that affect the customer experience label Nov 12, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label Nov 12, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServicesUx)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Nov 12, 2021
@tsullivan tsullivan added the impact:critical This issue should be addressed immediately due to a critical level of impact on the product. label Nov 23, 2021
@tsullivan tsullivan changed the title [Reporting] Add health check / diagnostic for the reports:monitor task [Reporting] Ensure reports:monitor task is never "unrecognized" Nov 23, 2021
@exalate-issue-sync exalate-issue-sync bot added loe:small Small Level of Effort impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. and removed impact:critical This issue should be addressed immediately due to a critical level of impact on the product. labels Nov 23, 2021
@tsullivan
Copy link
Member Author

Replacing with #120995

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:small Small Level of Effort
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants