Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AlertManager v2 #2643

Merged
merged 50 commits into from
Aug 1, 2023
Merged

AlertManager v2 #2643

merged 50 commits into from
Aug 1, 2023

Conversation

Konstantinov-Innokentii
Copy link
Member

@Konstantinov-Innokentii Konstantinov-Innokentii commented Jul 26, 2023

Introduce AlertManager v2 integration with improved internal behaviour – it's using grouping from AlertManager, not trying to re-group alerts on OnCall side.
Existing AlertManager and Grafana Alerting integrations are marked as Legacy with options to migrate them manually now or be migrated automatically after DEPRECATION DATE(TBD).
Integration urls and public api responses stay the same both for legacy and new integrations.

@Konstantinov-Innokentii Konstantinov-Innokentii marked this pull request as ready for review July 28, 2023 05:41
@Konstantinov-Innokentii Konstantinov-Innokentii requested a review from a team as a code owner July 28, 2023 05:41
@Konstantinov-Innokentii Konstantinov-Innokentii requested review from a team July 28, 2023 05:41
Copy link
Contributor

@joeyorlando joeyorlando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM, nice work 👍 !

I would also check w/ someone who is more familiar w/ the engine/config_integrations/*.py files and CSS.

CHANGELOG.md Outdated
@@ -25,6 +25,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Apply swap requests details to schedule events ([#2677](https://github.com/grafana/oncall/pull/2677))

- Rework of AlertManager integration ([#2643](https://github.com/grafana/oncall/pull/2643))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to "Unreleased" section

> You must have the [role of Admin][user-and-team-management] to be able to create integrations in Grafana OnCall.
> ⚠️ A note about **(Legacy)** integrations:
> We are changing internal behaviour of AlertManager integration.
> Integrations that were created before version **VERSION** are marked as **(Legacy)**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swap out **VERSION**

> ⚠️ A note about **(Legacy)** integrations:
> We are changing internal behaviour of AlertManager integration.
> Integrations that were created before version **VERSION** are marked as **(Legacy)**.
> These integrations are still receiving and escalating alerts but will be automatically migrated after DEPRECATION_DATE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swap out DEPRECATION_DATE

docs/sources/integrations/alertmanager/index.md Outdated Show resolved Hide resolved
Comment on lines 19 to 20
> Integrations that were created before version **VERSION** are marked as **(Legacy)**.
> These integrations are still receiving and escalating alerts but will be automatically migrated after DEPRECATION_DATE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here w/ **VERSION** and DEPRECATION_DATE

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fill them right before the release, since I'll now VERSION only then.

Comment on lines +63 to +64
value = remove_legacy_prefix(value)
return value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
value = remove_legacy_prefix(value)
return value
return remove_legacy_prefix(value)

if integration == AlertReceiveChannel.INTEGRATION_GRAFANA_ALERTING:
# TODO: probably only needs to check if unified alerting is on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this TODO still relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it still makes sense for Grafana Alerting v2

Comment on lines 246 to 268
return renderContent();

function renderContent() {
if (isLegacyIntegration) {
return (
<HorizontalGroup spacing="xs">
<Tooltip placement="top" content={'This integration has been deprecated, consider migrating it.'}>
<Icon name="info-circle" className="u-opacity" />
</Tooltip>
<Text type="secondary">
<span className="u-opacity">{integration?.display_name}</span>
</Text>
</HorizontalGroup>
);
}

return (
<HorizontalGroup spacing="xs">
<IntegrationLogo scale={0.08} integration={integration} />
<Text type="secondary">{integration?.display_name}</Text>
</HorizontalGroup>
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a separate renderContent function needed here? what about just including the body of renderContent inside of renderDatasource instead of declaring a separate function that is then always invoked?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, not really. silly mistake

Comment on lines +940 to +976
{isLegacyIntegration && (
<WithPermissionControlTooltip userAction={UserActions.IntegrationsWrite}>
<div
className={cx('integration__actionItem')}
onClick={() =>
setConfirmModal({
isOpen: true,
title: 'Migrate Integration?',
body: (
<VerticalGroup spacing="lg">
<Text type="primary">
Are you sure you want to migrate <Emoji text={alertReceiveChannel.verbal_name} /> ?
</Text>

<VerticalGroup spacing="xs">
<Text type="secondary">- Integration internal behaviour will be changed</Text>
<Text type="secondary">
- Integration URL will stay the same, so no need to change {getMigrationDisplayName()}{' '}
configuration
</Text>
<Text type="secondary">
- Integration templates will be reset to suit the new payload
</Text>
<Text type="secondary">- It is needed to adjust routes manually to the new payload</Text>
</VerticalGroup>
</VerticalGroup>
),
onConfirm: onIntegrationMigrate,
dismissText: 'Cancel',
confirmText: 'Migrate',
})
}
>
Migrate
</div>
</WithPermissionControlTooltip>
)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you consider adding a link to the migration docs you added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link is added in warning banner. This modal is a final message before pressing migrate button. I'll show a demo at weekly :)

Comment on lines 232 to 237
async migrateChannelFilter(id: AlertReceiveChannel['id']) {
return await makeRequest(`/alert_receive_channels/${id}/migrate`, {
method: 'POST',
});
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async migrateChannelFilter(id: AlertReceiveChannel['id']) {
return await makeRequest(`/alert_receive_channels/${id}/migrate`, {
method: 'POST',
});
}
migrateChannelFilter = (id: AlertReceiveChannel['id']) =>
makeRequest(`/alert_receive_channels/${id}/migrate`, {
method: 'POST',
});

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All store methods use async/await notation, so I think it's worth to discuss it with frontend team. What is the advantage of using arrow function here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a preference of shortening the outputted code, no real advantage. IMO marking it as async/await it has better readability

@Konstantinov-Innokentii Konstantinov-Innokentii merged commit 1ccb9d6 into dev Aug 1, 2023
25 checks passed
@@ -552,7 +555,12 @@ def send_demo_alert(self, payload=None):
if payload is None:
payload = self.config.example_payload

if self.has_alertmanager_payload_structure:
# TODO: AMV2: hack to keep demo alert working for integration with legacy alertmanager behaviour.
Copy link
Contributor

@iskhakov iskhakov Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide more clear instruction what should be done here and when?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I can't. We still have GRAFANA integration which might or might not have legacy alertmanager behaviour and that's why there is no clear instruction what to do. I'll remove TODO.

Comment on lines +118 to +119
# if has_legacy_prefix(integration):
# raise BadRequest(detail="This integration is deprecated")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bumping this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't use abbreviations as they are reducing code readability

alerts = request.data.get("alerts", [])

data = request.data
if "firingAlerts" not in request.data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding a check for max_alerts (or limit alerts), to avoid processing alerts from misconfigured alertmanagers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't help us. If we receive request which suits our size limits, iteration over array which already in memory is almost free

@@ -1,38 +1,50 @@
# Main
enabled = True
title = "Alertmanager"
title = "AlertManager"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep "Alertmanager", this is how it called by the authors https://prometheus.io/docs/alerting/latest/alertmanager/


# Slack
# Slack templates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's be consistent with the comments


slack_image_url = None

# SMS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be consistent with the comments

@@ -65,7 +65,7 @@
FEATURE_INBOUND_EMAIL_ENABLED = getenv_boolean("FEATURE_INBOUND_EMAIL_ENABLED", default=False)
FEATURE_PROMETHEUS_EXPORTER_ENABLED = getenv_boolean("FEATURE_PROMETHEUS_EXPORTER_ENABLED", default=False)
FEATURE_WEBHOOKS_2_ENABLED = getenv_boolean("FEATURE_WEBHOOKS_2_ENABLED", default=True)
FEATURE_SHIFT_SWAPS_ENABLED = getenv_boolean("FEATURE_SHIFT_SWAPS_ENABLED", default=False)
FEATURE_SHIFT_SWAPS_ENABLED = getenv_boolean("FEATURE_SHIFT_SWAPS_ENABLED", default=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mistake

brojd pushed a commit that referenced this pull request Sep 18, 2024
Introduce AlertManager v2 integration with improved internal behaviour

it's using grouping from AlertManager, not trying to re-group alerts on
OnCall side.
Existing AlertManager and Grafana Alerting integrations are marked as
Legacy with options to migrate them manually now or be migrated
automatically after DEPRECATION DATE(TBD).
Integration urls and public api responses stay the same both for legacy
and new integrations.

---------

Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants