Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task Manager] Adding list of explicitly de-registered task types #123963

Merged
merged 16 commits into from
Feb 15, 2022

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Jan 27, 2022

Resolves #122389

Summary

Adds a list of explicitly de-registered task types and updates claim script to only mark a task as unrecognized if the task type exists in that list.

In order to explicitly stop registering a task, plugin authors will need to remove the task registration code as well as add the task type to the REMOVED_TYPES list in x-pack/plugins/task_manager/server/task_type_dictionary.ts

  • If the task is not registered but not added to REMOVED_TYPES, it will not be marked as unrecognized. It will just not be claimed.
  • If a task is added to REMOVED_TYPES but still registered, an error will be thrown. These scenarios should be caught in development.

Todo

  • Add old siem rule type to list.

To Verify

  1. Create a rule that runs frequently and make sure it runs (I used the People in Space rule). Query for the rule in the task manager index and see that status is either idle or running.
  2. Update the code so that this rule type is no longer registered with alerting
--- a/x-pack/examples/alerting_example/server/plugin.ts
+++ b/x-pack/examples/alerting_example/server/plugin.ts
@@ -28,7 +28,7 @@ export interface AlertingExampleDeps {
 export class AlertingExamplePlugin implements Plugin<void, void, AlertingExampleDeps> {
   public setup(core: CoreSetup, { alerting, features }: AlertingExampleDeps) {
     alerting.registerType(alwaysFiringAlert);
-    alerting.registerType(peopleInSpaceAlert);
+    // alerting.registerType(peopleInSpaceAlert);
  1. Verify that the rule no longer runs and the status stays idle
  2. Revert the update made in step 2. Verify that the rule starts running again.
  3. Update the code to add this rule type to the list of removed types
--- a/x-pack/plugins/task_manager/server/task_type_dictionary.ts
+++ b/x-pack/plugins/task_manager/server/task_type_dictionary.ts
@@ -14,6 +14,7 @@ import { Logger } from '../../../../src/core/server';
 export const REMOVED_TYPES: string[] = [
   // for testing
   'sampleTaskRemovedType',
+  'alerting:example.people-in-space',
 ];
  1. Verify that the rule status is updated to unrecognized.

Checklist

@ymao1
Copy link
Contributor Author

ymao1 commented Feb 2, 2022

@elasticmachine merge upstream

@ymao1
Copy link
Contributor Author

ymao1 commented Feb 3, 2022

@elasticmachine merge upstream

@ymao1 ymao1 changed the title Adding REMOVED_TYPES to task manager and only marking those types as … [Task Manager] Adding list of explicitly de-registered task types Feb 3, 2022
@ymao1 ymao1 self-assigned this Feb 3, 2022
@ymao1 ymao1 added backport:skip This commit does not require backporting Feature:Task Manager release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.2.0 labels Feb 3, 2022
@ymao1 ymao1 marked this pull request as ready for review February 3, 2022 17:06
@ymao1 ymao1 requested a review from a team as a code owner February 3, 2022 17:06
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

*/
export const REMOVED_TYPES: string[] = [
// for testing
'sampleTaskRemovedType',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this PR is merged will we not actually have any type in here?
It raises the question of whether we need the unusedTypes feature at all, no? 🤔

@pmuellr
Copy link
Member

pmuellr commented Feb 9, 2022

This reminds me of an issue I forgot to open up, when looking to disable apm telemetry. I believe when I looked into this, if apm was configured to disable telemetry, the existing telemetry task would eventually get marked as unrecognized, with no way to reverse that without explicitly deleting the task doc from the TM index. I think if we merge something like this, the existing code would then work fine.

Start here, where the apm plugin sets up telemetry stuff, including registering the telemetry task, in the plugin setup(), by calling createApmTelemetry() - but that call is made conditionally:

public setup(
core: CoreSetup<APMPluginStartDependencies>,
plugins: APMPluginSetupDependencies
) {
this.logger = this.initContext.logger.get();
const config$ = this.initContext.config.create<APMConfig>();
core.savedObjects.registerType(apmIndices);
core.savedObjects.registerType(apmTelemetry);
core.savedObjects.registerType(apmServerSettings);
const currentConfig = this.initContext.config.get<APMConfig>();
this.currentConfig = currentConfig;
if (
plugins.taskManager &&
plugins.usageCollection &&
currentConfig.telemetryCollectionEnabled
) {
createApmTelemetry({
core,
config$,
usageCollector: plugins.usageCollection,
taskManager: plugins.taskManager,
logger: this.logger,
kibanaVersion: this.initContext.env.packageInfo.version,
});
}

The actual task registration is in that createApmTelemetry() function:

export async function createApmTelemetry({
core,
config$,
usageCollector,
taskManager,
logger,
kibanaVersion,
}: {
core: CoreSetup;
config$: Observable<APMConfig>;
usageCollector: UsageCollectionSetup;
taskManager: TaskManagerSetupContract;
logger: Logger;
kibanaVersion: string;
}) {
taskManager.registerTaskDefinitions({
[APM_TELEMETRY_TASK_NAME]: {
title: 'Collect APM usage',
createTaskRunner: () => {
return {
run: async () => {
await collectAndStore();
},
cancel: async () => {},
};
},
},
});

It wouldn't surprise me to find other task registrations that are conditional like this. Again, I >think< with the change in this PR, such tasks would be ignored, until Kibana was later restarted and when down a conditional path to register the task again. And wouldn't end up in the unrecognized task black hole. Am I thinking about that right?

@ymao1 ymao1 requested review from mikecote and pmuellr February 10, 2022 13:19
@ymao1 ymao1 requested a review from ersin-erdal February 10, 2022 13:19
@ymao1
Copy link
Contributor Author

ymao1 commented Feb 10, 2022

It wouldn't surprise me to find other task registrations that are conditional like this. Again, I >think< with the change in this PR, such tasks would be ignored, until Kibana was later restarted and when down a conditional path to register the task again. And wouldn't end up in the unrecognized task black hole. Am I thinking about that right?

@pmuellr Yes, that's correct. With the change in this PR, those tasks would not get marked as unrecognized unless they are specifically added to the UNUSED_TYPES list. Instead task manager would just ignore them so they stay idle until a Kibana that does register the task type tries to claim them.

@@ -126,7 +127,7 @@ if (doc['task.runAt'].size()!=0) {
ctx._source.task.status = "claiming"; ${Object.keys(fieldUpdates)
.map((field) => `ctx._source.task.${field}=params.fieldUpdates.${field};`)
.join(' ')}
} else if (!params.skippedTaskTypes.contains(ctx._source.task.taskType)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to test the skipped tasks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we maintained a list of claimableTaskTypes and skippedTaskTypes. These are all the types that this task manager knows about so if a task is not claimable or skipped, we mark is as unrecognized. Now we are just using claimableTaskTypes to claim and unusedTaskTypes to determine what to mark as unrecognized. Skipped task types will go into the else bucket and get a no-op.

Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, tested locally the following scenarios:

  • Alerting rule with a task in "unrecognized" status returning to idle after upgrade
  • Alerting rule with a task definition missing did not make the task have an unrecognized status
  • Alerting rule with a task definition missing and part of REMOVED_TYPES caused the task status to become unrecognized

👍

@ymao1
Copy link
Contributor Author

ymao1 commented Feb 14, 2022

@elasticmachine merge upstream

@ymao1 ymao1 requested a review from ersin-erdal February 14, 2022 13:06
updateFieldsAndMarkAsFailed(fieldUpdates, claimTasksById, ['foo', 'bar'], [], {
foo: 5,
bar: 2,
updateFieldsAndMarkAsFailed({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean Code, group the arguments, love it :)

Copy link
Contributor

@ersin-erdal ersin-erdal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ymao1
Copy link
Contributor Author

ymao1 commented Feb 15, 2022

@elasticmachine merge upstream

'sampleTaskRemovedType',

// deprecated in https://github.com/elastic/kibana/pull/121442
'alerting:siem.signals',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madirey @marshallmain

Tagging you on this PR for awareness. We adding an explicit registry of removed task types that will be marked as unrecognized. Adding alerting:siem.signals due to this PR that removed that rule type. There should be no change to the outcome of that PR (meaning old siem.signals rule types will still be marked as unrecognized)

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@ymao1 ymao1 merged commit c2a0103 into elastic:main Feb 15, 2022
@tsullivan
Copy link
Member

I just found out about this PR by investigating #120995 and finding I could no longer reproduce the Reporting issue. Thank you @ymao1!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Task Manager release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task Manager] Investigate better handling of unregistered task types
9 participants