Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Uptime] Refactor cert alerts from batched to individual #102138

Conversation

dominiqueclarke
Copy link
Contributor

@dominiqueclarke dominiqueclarke commented Jun 14, 2021

Fixes #101150

This PR transitions batched certificate alerts into individual certificate alerts. Previously, 1 alert was generated for all certs that listed a summary of the number of expiring and aging certs. Now there is one alert for each expired or aging cert along with context on it's status. An example of the alert content can be found below.

image

Here are some highlights form the implementation

  1. Adjust the default action connector message to suit individual alerts
  2. Iterate over found certs and generate an alert instance per cert
  3. Generate cert status to populate action connector message

Note
This PR will break the current batched cert alert experience. The new alert executor function will no longer provides the action variables (for example, expiring/aging count and batched summary strings) necessary for the batched alert message, and the new individual alert instances will not work with the old default action message, as it would create duplicate alert strings which the alerting framework rejects as duplicate alert instances. This creates a need for a migration to transition the default action message to the updated individual message. Some consideration may need to happen for users who have created custom action messages that expect the previous action variables.

This PR creates a new TLS rule type tlsIndividual, while maintaining support for the old rule type. All of the old rule type server side code is maintained in a separate code path in order to keep compatibility with old batched certificate rules.

Author Checklist

  • Accessibility has been considered, relevant aria tags and other accessibility features implemented
    note: This feature is purely technical in nature, and does not directly touch UI elements that need accessibility considerations.
  • Telemetry has been added where relevant @justinkambic @shahzad31 Do we currently have telemetry in place for alerts?
  • Docs have been added to this PR covering any new, changed, or removed features
  • Testing for expected behavior is in place
    • Automated tests exist to cover expected and edge case conditions
    • User acceptance testing has been carried out to ensure the feature functions as expected within the context of how it will be used
    • Any special/edge cases that need to be manually tested must be documented
    • Ensure the new layout works responsively (including down to small phone widths, where makes sense for the user flow, e.g. the error page when reacting to an alert)
      note: This feature is purely technical in nature, and does not directly touch UI elements that need accessibility considerations.

Reviewer Checklist

  • Accessibility has been considered, relevant aria tags and other accessibility features implemented
  • Telemetry has been added where relevant
  • Docs have been added to this PR covering any new, changed, or removed features
  • Testing for expected behavior is in place
    • Automated tests exist to cover expected and edge case conditions
    • User acceptance testing has been carried out to ensure the feature functions as expected within the context of how it will be used
    • Any special/edge cases that need to be manually tested must be documented
    • Ensure the new layout works responsively (including down to small phone widths, where makes sense for the user flow, e.g. the error page when reacting to an alert)

For maintainers

@dominiqueclarke dominiqueclarke requested a review from a team as a code owner June 14, 2021 21:07
@botelastic botelastic bot added the Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability label Jun 14, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/uptime (Team:uptime)

@dominiqueclarke dominiqueclarke added Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability v7.13.0 v7.14.0 v8.0.0 enhancement New value added to drive a business result release_note:breaking auto-backport Deprecated - use backport:version if exact versions are needed and removed Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability v7.13.0 labels Jun 14, 2021
@dominiqueclarke dominiqueclarke changed the title refactor cert alerts from batched to individual [Uptime] Refactor cert alerts from batched to individual Jun 14, 2021
@dominiqueclarke
Copy link
Contributor Author

@elasticmachine merge upstream

@dominiqueclarke
Copy link
Contributor Author

@elasticmachine merge upstream

id: CLIENT_ALERT_TYPES.TLS_LEGACY,
iconClass: 'uptimeApp',
documentationUrl(docLinks) {
return `${docLinks.ELASTIC_WEBSITE_URL}guide/en/uptime/${docLinks.DOC_LINK_VERSION}/uptime-alerting.html#_tls_alerts`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update relevant docs as well especially for message , otherwise create a follow up issue for docs team.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we handle the deprecation of the other rule type in the docs?

@@ -8,14 +8,31 @@
import { i18n } from '@kbn/i18n';

export const TlsTranslations = {
defaultActionMessage: i18n.translate('xpack.uptime.alerts.tls.legacy.defaultActionMessage', {
defaultMessage: `Detected TLS certificate {commonName} from issuer {issuer} is expiring or becoming too old. Certificate {summary}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message should be more accurate like we discussed offline

)
.valueOf();
const alertInstance = alertInstanceFactory(
`${TLS.id}-${cert.common_name}-${cert.issuer}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need tls.id common_name and issue should be enough. tls.id will pollute ui where we are displaying list of instances in alert management ui

Comment on lines 112 to 118
const uniqueCerts = certs.reduce<Cert[]>((uniqueCertArray, currentCert) => {
if (!uniqueCertArray.find((cert) => cert.common_name === currentCert.common_name)) {
uniqueCertArray.push(currentCert);
}

return uniqueCertArray;
}, []);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need this reduce, if you check getCerts function, es query is a collapse on tls.server.hash.sha256 which means it will only return unique certificates. common_name actually can be same, tls.server.hash.sha256 will always be different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will use sha256 for the id then, as using common_name for the individual alert instance id was causing duplicate scheduling issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would say keep common name at start of instance id and sha256 after that. sha256 isn't very helpful in identification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe append common name + index or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline the id will be common name + issuer + sha256

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we keeping this reduce? i think we don't need this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shahzad31 , you're right, my mistake.

@shahzad31
Copy link
Contributor

@dominiqueclarke can we change the alert flyout , notify default for this alert, i think , by default we should set it to Every time alert is active, On Status change doesn't make sense for this Alert
image

On status change means, user will be only notified once, i think for this it make sense to remind them after interval.

@dominiqueclarke
Copy link
Contributor Author

@dominiqueclarke can we change the alert flyout , notify default for this alert, i think , by default we should set it to Every time alert is active, On Status change doesn't make sense for this Alert
image

On status change means, user will be only notified once, i think for this it make sense to remind them after interval.

That means that you would get an action connector message each time the executor runs for the set interval, the default of which is 1 minute. That seems a bit excessive, if our users aren't updating the interval. Perhaps we should also update the interval.

Copy link
Contributor

@shahzad31 shahzad31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have any blocking concerns besides the extra reduce method, otherwise looks good.

Great Job !!

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
uptime 561 562 +1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
uptime 20.3KB 22.0KB +1.7KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@dominiqueclarke dominiqueclarke merged commit 450abab into elastic:master Jun 23, 2021
@dominiqueclarke dominiqueclarke deleted the feature/101150-individual-cert-alerts branch June 23, 2021 00:56
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Jun 23, 2021
)

* refactor cert alerts from batched to individual

* remove old translations

* create new certificate alert rule type and transition old cert rule type to legacy

* update translations

* maintain legacy tls rule UI to support legacy rule editing

* update translations

* update TLS alert content, rule type id, and alert instance id schema

* remove extraneous logic and format date content

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Jun 23, 2021
…103035)

* refactor cert alerts from batched to individual

* remove old translations

* create new certificate alert rule type and transition old cert rule type to legacy

* update translations

* maintain legacy tls rule UI to support legacy rule editing

* update translations

* update TLS alert content, rule type id, and alert instance id schema

* remove extraneous logic and format date content

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Dominique Clarke <doclarke71@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed enhancement New value added to drive a business result release_note:enhancement Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability v7.14.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Uptime] Transition batched certificate alerts to single certificate alerts
4 participants