Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Threshold alerts #59566

Merged
merged 7 commits into from
Mar 24, 2020
Merged

Conversation

dgieselaar
Copy link
Member

@dgieselaar dgieselaar commented Mar 6, 2020

Closes #49038 (actually, does it? should we remove the watcher functionality in this PR as well we'll keep the Watcher one as well for the time being).

Out of scope:

I've not yet tested all scenarios, but it would be helpful to get some feedback while I'm out on spacetime so I can pick it up directly after.

Related issues:

Relevant changes for @elastic/kibana-alerting-services are mostly in a77f088bd5aec73716cb68a0fb9ceaeb19fa9adc and 73d6bdc5b111c61038e06e488ef37212c55a74bc.

image
image
image
image

'save',
'alerting:show',
'actions:show',
'alerting:save',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to grant users granular access, like alerting:show bu not actions:show?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible for us? Or for admins?

@dgieselaar
Copy link
Member Author

dgieselaar commented Mar 10, 2020 via email

@@ -88,22 +88,31 @@ export const apm: LegacyPluginInitializer = kibana => {
catalogue: ['apm'],
privileges: {
all: {
api: ['apm', 'apm_write'],
api: ['apm', 'apm_write', 'actions-read', 'alerting-read'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apm_write uses snake_case. Should we stay consistent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. Should apm be renamed to apm_read?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah.. actions-read is what's defined in the Action app endpoints...
My mind is a little hazy on this mapping concept. Are we mapping tags to privileges here? How does it work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood it last time I looked into it, but that was last year. It seems this mapping concept is not immediately intuitive so adding some comments (albeit a bit verbose) might be a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would just linking to x-pack/plugins/features/common/feature_kibana_privileges.ts where it's explained be OK? Rather than copying documentation between files?

AlertAdd
} from '../../../../../../../../../plugins/triggers_actions_ui/public';

type AlertAddProps = React.ComponentProps<typeof AlertAdd>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such an underrated ts helper

parsedInterval = parseEsInterval(duration);
} catch (err) {
//
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the error be handled somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't, but I'll leave a comment there with the reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case this feels like something that shouldn't be thrown from parseEsInterval. But I can see how that is not easily changed.

value={parsedInterval?.value ?? ''}
onChange={e => onChange(`${e.target.value}${displayedUnit.current}`)}
prepend={i18n.translate(
'xpack.apm.serviceAlertTrigger.durationField..last',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This key looks off durationField..last

//
}

const displayedUnit = useRef(parsedInterval?.unit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the ref needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To allow the user to clear the number input. I'll add a comment there as well:

  // we use a ref that we only update when we've succesfully parsed
  // the interval. this allows the user to clear the number input
  // without us having to fill it with a 0 in order for it to be
  // succesfully parsed.

Copy link
Member

@sorenlouv sorenlouv Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to fill it with a 0? Is "" (empty string) not valid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what this is (was) about - supporting anything that is not a valid number, which is e.g. what happens when the user clears the field. Once that happens we can no longer successfully parse the interval.

In any case I'm now using the ForLastExpression that the triggers_actions_ui plugin provides.

});

// eslint-disable-next-line react-hooks/exhaustive-deps
}, []);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😬I'll add a comment.

Comment on lines 72 to 73
`service.name:${params.serviceName}`,
`transaction.type:${params.transactionType}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the ecs constants:

Suggested change
`service.name:${params.serviceName}`,
`transaction.type:${params.transactionType}`
`${SERVICE_NAME}:${params.serviceName}`,
`${TRANSACTION_TYPE}:${params.transactionType}`

Ntw. what do you think about renaming elasticsearch_fieldnames.ts to ecs.ts ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not all ECS (unfortunately). ie, service.environment is not in the ECS specification. Same goes for error.grouping_key and a couple of others. So maybe elasticsearch_fieldnames is good enough for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I'm removing the service.name and transaction.type tags. We're not doing anything wiht lookups now, and in the future (IIRC) we can use the params.

@formgeist
Copy link
Contributor

@dgieselaar I've collected some immediate design feedback and polish comments;

  1. Can we add some separation between these two form fields? 8px like the others?

Screenshot 2020-03-12 at 14 40 34

Screenshot 2020-03-12 at 14 40 43

  1. I suppose we need the transaction.type selection and tag is needed to indicate the type. Wonder if we need to make this part of the trigger definition and prefill with the selected type from the Transaction page. Secondly, is the transaction.type needed for error rate alerts?

Screenshot 2020-03-12 at 14 40 53

transaction.type tag shows for Error rate alerts.

Screenshot 2020-03-12 at 15 08 35

  1. Do we control the copy in these tooltips?

Screenshot 2020-03-12 at 14 41 08

  1. Add a spacer between the trigger definitions and the alert actions. Probably a EuiSpacer size: m.

Screenshot 2020-03-12 at 14 56 32

  1. Noticed that the Transaction duration alert has the type Error rate.

Screenshot 2020-03-12 at 14 57 58

  1. Another spacing form fields in the Error rate flyout

Screenshot 2020-03-12 at 15 23 35

Also, can you tell me how exactly we calculate the Error rate? The number of new errors over the selected time range? Not sure the label title and append copy are good enough...

Looks great otherwise 🙌

Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Happy to see one more alerting integration!

Copy link
Contributor

@formgeist formgeist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor label changes that move for consistency across Observability apps.

name: i18n.translate(
'xpack.apm.serviceDetails.alertsMenu.viewActiveAlerts',
{
defaultMessage: 'View active alerts'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defaultMessage: 'View active alerts'
defaultMessage: 'Manage alerts'

const createThresholdAlertLabel = i18n.translate(
'xpack.apm.serviceDetails.alertsMenu.createThresholdAlert',
{
defaultMessage: 'Create threshold alert'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defaultMessage: 'Create threshold alert'
defaultMessage: 'Create alert'

name: i18n.translate(
'xpack.apm.serviceDetails.alertsMenu.transactionDuration',
{
defaultMessage: 'Transaction duration'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defaultMessage: 'Transaction duration'
defaultMessage: 'Transaction duration threshold'

name: i18n.translate(
'xpack.apm.serviceDetails.alertsMenu.errorRate',
{
defaultMessage: 'Error rate'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defaultMessage: 'Error rate'
defaultMessage: 'Error rate threshold'

@dgieselaar dgieselaar requested a review from a team as a code owner March 19, 2020 19:33
@dgieselaar dgieselaar removed the request for review from a team March 23, 2020 19:27
@dgieselaar dgieselaar force-pushed the error-group-alert branch 2 times, most recently from 4a9bcf5 to bd7a6f0 Compare March 23, 2020 22:20
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@dgieselaar dgieselaar merged commit 85c0be3 into elastic:master Mar 24, 2020
@dgieselaar dgieselaar deleted the error-group-alert branch March 24, 2020 10:15
dgieselaar added a commit to dgieselaar/kibana that referenced this pull request Mar 24, 2020
* Add alerting/actions permissions for APM

* Export TIME_UNITS, getTimeUnitLabel from triggers actions UI plugin

* Add APM alert types and UI

* Review feedback

* Use Expression components for triggers

* Update alert name for transaction duration

* Change defaults for error rate trigger
dgieselaar added a commit that referenced this pull request Mar 24, 2020
* Add alerting/actions permissions for APM

* Export TIME_UNITS, getTimeUnitLabel from triggers actions UI plugin

* Add APM alert types and UI

* Review feedback

* Use Expression components for triggers

* Update alert name for transaction duration

* Change defaults for error rate trigger
gmmorris added a commit to gmmorris/kibana that referenced this pull request Mar 24, 2020
* master:
  Updating our direct usage of https-proxy-agent to 5.0.0 (elastic#58296)
  allow users to unset the throttle of an alert (elastic#60964)
  [Lens] Fix bug in metric config panel (elastic#60982)
  [SearchProfiler] Minor fixes (elastic#60919)
  [ML] Renaming ML setup and start contracts (elastic#60980)
  introduce StartServicesAccessor type for `CoreSetup.getStartServices` (elastic#60748)
  [SIEM][Detection Engine] Add rule's notification alert type (elastic#60832)
  [APM] Re-revert "Collect telemetry about data/API performance" (elastic#61030)
  [NP] Graph: get rid of saved objects class wrapper (elastic#59917)
  [EPM] merge duplicate fields when creating index patterns (elastic#60957)
  [Uptime] Ml detection of duration anomalies (elastic#59785)
  [Alerting] removes unimplemented buttons from Alert Details page (elastic#60934)
  [skip-ci] Fix CODEOWNERS paths for the Pulse team (elastic#60944)
  [APM] Threshold alerts (elastic#59566)
  [ML] Add support for percentiles aggregation to Transform wizard (elastic#60763)
  Cahgen save object duplicate message (elastic#60901)
@cauemarcondes cauemarcondes self-assigned this Apr 6, 2020
@cauemarcondes
Copy link
Contributor

Tests ok.

@cauemarcondes cauemarcondes added the apm:test-plan-done Pull request that was successfully tested during the test plan label Apr 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm:test-plan-done Pull request that was successfully tested during the test plan release_note:enhancement v7.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[APM] Replace watcher with Alerting
6 participants