Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add responders improvements #3128

Merged
merged 48 commits into from
Oct 27, 2023
Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
4d6794b
WIP direct paging changes
joeyorlando Oct 5, 2023
4bfd384
Merge branch 'dev' into jorlando/direct-paging
joeyorlando Oct 5, 2023
0d8b632
update tests
joeyorlando Oct 5, 2023
4a74bff
WIP
joeyorlando Oct 6, 2023
dc058c2
WIP
joeyorlando Oct 6, 2023
3d80dbf
WIP
joeyorlando Oct 7, 2023
25c7912
WIP
joeyorlando Oct 17, 2023
c8581b0
WIP
joeyorlando Oct 17, 2023
748791c
WIP
joeyorlando Oct 19, 2023
50bd4e7
update some unit tests
joeyorlando Oct 19, 2023
cd64267
add some more tests
joeyorlando Oct 19, 2023
8146c47
add more tests
joeyorlando Oct 19, 2023
0f86f12
allow filtering users by is_currently_oncall
joeyorlando Oct 19, 2023
4bb90c3
allow filtering by is_currently_oncall=false
joeyorlando Oct 19, 2023
2f72920
WIP
joeyorlando Oct 20, 2023
2ba681a
WIP
joeyorlando Oct 20, 2023
0bf9323
WIP
joeyorlando Oct 20, 2023
a46aede
WIP
joeyorlando Oct 23, 2023
5d98949
WIP
joeyorlando Oct 23, 2023
3f30e76
style add responders popup
joeyorlando Oct 23, 2023
8899845
update backend unit tests
joeyorlando Oct 23, 2023
4bcc713
update some more backend tests
joeyorlando Oct 24, 2023
8a1c17b
add title attribute to direct paging endpoint
joeyorlando Oct 24, 2023
2019eee
add more backend tests
joeyorlando Oct 24, 2023
fd5f5d8
more tests + UI styling changes
joeyorlando Oct 24, 2023
2b932c8
Merge branch 'dev' into jorlando/direct-paging
joeyorlando Oct 24, 2023
f51c735
add more frontend unit tests
joeyorlando Oct 24, 2023
28f33f4
update changelog
joeyorlando Oct 24, 2023
d2c2ad5
address PR comments
joeyorlando Oct 25, 2023
fc89fe8
revert change to GForm
joeyorlando Oct 25, 2023
0ac4521
add teams to objects in paged_users
joeyorlando Oct 25, 2023
a05e691
update public documentation
joeyorlando Oct 26, 2023
3808099
Merge branch 'dev' into jorlando/direct-paging
joeyorlando Oct 26, 2023
88fbfc6
Merge branch 'jorlando/direct-paging' of github.com:grafana/oncall in…
joeyorlando Oct 26, 2023
a21492a
disable submit button if form is not valid or user
joeyorlando Oct 26, 2023
8d80c8e
simplify optional prop null check
joeyorlando Oct 26, 2023
d0b0266
final frontend changes + update e2e tests
joeyorlando Oct 26, 2023
7140308
remove test.only
joeyorlando Oct 26, 2023
9acab65
update swagger UI in a subsequent PR
joeyorlando Oct 26, 2023
c5ee288
Merge branch 'dev' into jorlando/direct-paging
joeyorlando Oct 26, 2023
51fdb26
address failing build
joeyorlando Oct 26, 2023
d0c5b0e
Merge branch 'jorlando/direct-paging' of github.com:grafana/oncall in…
joeyorlando Oct 26, 2023
02d73c3
add unit tests for DirectPagingStore
joeyorlando Oct 26, 2023
cc5388a
address PR comments
joeyorlando Oct 27, 2023
a3b69fc
address frontend PR comments
joeyorlando Oct 27, 2023
3ac88ad
address more PR comments
joeyorlando Oct 27, 2023
9855f6f
remove unused method
joeyorlando Oct 27, 2023
500a696
final PR comments
joeyorlando Oct 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased

### Changed

- Simplify Direct Paging workflow. Now when using Direct Paging you either simply specify a team, or one or more users
to page by @joeyorlando ([#3128](https://github.com/grafana/oncall/pull/3128))

## v1.3.47 (2023-10-25)

### Fixed
Expand Down
53 changes: 24 additions & 29 deletions docs/sources/integrations/manual/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,58 +22,53 @@ However, sometimes you might need to page a [team][manage-teams] or request assi
are not part of these pre-defined rules.

For such ad-hoc scenarios, Grafana OnCall allows you to create an alert group, input necessary information, and decide
who will be alerted – a team, a user, or an on-call user from a specific schedule.
who will be alerted – a team, or a set of users.

## Page a team

Click on **+ New alert group** on the **Alert groups** page to start creating a new alert group.
From there, you can configure the alert group to notify a particular team and optionally include additional users or
schedules. Here are the inputs you need to fill in:
Click on **+ Escalation** on the **Alert groups** page to start creating a new alert group.
From there, you can configure the alert group to notify a particular team and optionally include additional users. Here are the inputs you need to fill in:

- **Title**: Write a brief and clear title for your alert group.
- **Message**: Optionally, add a message to provide more details or instructions.
- **Message**: Write a message to provide more details or instructions to those whom you are paging.
- **Team**: Select the team you want to page. The team's
[direct paging integration](#learn-the-flow-and-handle-warnings) will be used for notification.
- **Additional Responders**: Optionally, include more responders for the alert group.
These could be any combination of users and schedules.
For each additional responder (user or schedule), you can select a notification policy: [default or important][notify].
[direct paging integration](#learn-the-flow-and-handle-warnings) will be used for notification. _Note_ that you will only
see teams that have a "contactable" direct paging integration (ie. it has an escalation chain assigned to it, or has
at least one Chatops integration connected to send notifications to).
- **Users**: Include more users to the alert group. For each additional user, you can select a notification policy:
[default or important][notify].

> The same feature is also available as [**/escalate**][slack-escalate] Slack command.

## Add responders for an existing alert group
## Add users to an existing alert group

If you want to page more people for an existing alert group, you can do so using the **Notify additional responders**
button on the specific alert group's page. Here you can select more users, or choose users who are on-call for specific
schedules. The same functionality is available in Slack using the **Responders** button in the alert group's message.
If you want to page more people for an existing alert group, you can do so using the **+ Add**
button, within the "Participants" section on the specific alert group's page. The same functionality is available in
Slack using the **Responders** button in the alert group's message.

Notifying additional responders doesn't disrupt or interfere with the escalation chain configured for the alert group;
it simply adds more responders and notifies them immediately. Note that adding responders for an existing alert group
Notifying additional users doesn't disrupt or interfere with the escalation chain configured for the alert group;
it simply adds more responders and notifies them immediately. Note that adding users for an existing alert group
will page them even if the alert group is silenced or acknowledged, but not if the alert group is resolved.

> It's not possible to page a team for an existing alert group. To page a specific team, you need to
[create a new alert group](#page-a-team).
> [create a new alert group](#page-a-team).

## Learn the flow and handle warnings

When you pick a team to page, Grafana OnCall will automatically use the right direct paging integration for the team.
"Direct paging" is a special kind of integration in Grafana OnCall that is unique per team and is used to send alerts
to the team's ChatOps channels and start an appropriate escalation chain.

If a team hasn't set up a direct paging integration, or if the integration doesn't have any escalation chains connected,
Grafana OnCall will issue a warning. If this happens, consider
[setting up a direct paging integration](#set-up-direct-paging-for-a-team) for the team
(or reach out to the relevant team and suggest doing so).

## Set up direct paging for a team

To create a direct paging integration for a team, click **+ New alert group** on the **Alert groups** page, choose the team,
and create an alert group, **regardless of any warnings**. This action automatically triggers Grafana OnCall to generate
a [direct paging integration](#learn-the-flow-and-handle-warnings) for the chosen team. Alternatively, navigate to
the **Integrations** page and create a new integration with type "Direct paging" from there, assigning it to the team.
By default all teams will have a direct paging integration created for them. However, these are not configured by default.
If a team does not have their direct paging integration configured, such that it is "contactable" (ie. it has an
escalation chain assigned to it, or has at least one Chatops integration connected to send notifications to), you will
not be able to direct page this team. If this happens, consider following the following steps for the team (or reach out
to the relevant team and suggest doing so).

After setting up the integration, you can customize its settings, link it to an escalation chain,
and configure associated ChatOps channels.
To confirm that the integration is functioning as intended, [create a new alert group](#page-a-team)
Navigate to the **Integrations** page and find the "Direct paging" integration for the team in question. From the
integration's detail page, you can customize its settings, link it to an escalation chain, and configure associated
ChatOps channels. To confirm that the integration is functioning as intended, [create a new alert group](#page-a-team)
and select the same team for a test run.

{{% docs/reference %}}
Expand Down
2 changes: 1 addition & 1 deletion docs/sources/open-source/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ features:
should_escape: false
- command: /escalate
url: <ONCALL_ENGINE_PUBLIC_URL>/slack/interactive_api_endpoint/
description: Direct page user(s) or schedule(s)
description: Direct page a team or user(s)
should_escape: false
oauth_config:
redirect_urls:
Expand Down
59 changes: 52 additions & 7 deletions engine/apps/alerts/models/alert_group.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,16 @@ class LogRecordUser(typing.TypedDict):
avatar_full: str


class PagedUser(typing.TypedDict):
id: int
username: str
name: str
pk: str
avatar: str
avatar_full: str
important: bool


class LogRecords(typing.TypedDict):
time: str # humanized delta relative to now
action: str # human-friendly description
Expand Down Expand Up @@ -509,22 +519,57 @@ def declare_incident_link(self) -> str:
def happened_while_maintenance(self):
return self.root_alert_group is not None and self.root_alert_group.maintenance_uuid is not None

def get_paged_users(self) -> QuerySet[User]:
def get_paged_users(self) -> typing.List[PagedUser]:
from apps.alerts.models import AlertGroupLogRecord

users_ids = set()
for log_record in self.log_records.filter(
user_ids: typing.Set[str] = set()
users: typing.List[PagedUser] = []

log_records = self.log_records.filter(
type__in=(AlertGroupLogRecord.TYPE_DIRECT_PAGING, AlertGroupLogRecord.TYPE_UNPAGE_USER)
):
)

for log_record in log_records:
# filter paging events, track still active escalations
info = log_record.get_step_specific_info()
user_id = info.get("user") if info else None
important = info.get("important") if info else None

if user_id is not None:
users_ids.add(
user_ids.add(
user_id
) if log_record.type == AlertGroupLogRecord.TYPE_DIRECT_PAGING else users_ids.discard(user_id)
) if log_record.type == AlertGroupLogRecord.TYPE_DIRECT_PAGING else user_ids.discard(user_id)

user_instances = User.objects.filter(public_primary_key__in=user_ids)
user_map = {u.public_primary_key: u for u in user_instances}

return User.objects.filter(public_primary_key__in=users_ids)
# mostly doing this second loop to avoid having to query each user individually in the first loop
for log_record in log_records:
# filter paging events, track still active escalations
info = log_record.get_step_specific_info()
user_id = info.get("user") if info else None
important = info.get("important") if info else False

if user_id is not None and (user := user_map.get(user_id)) is not None:
if log_record.type == AlertGroupLogRecord.TYPE_DIRECT_PAGING:
# add the user
users.append(
{
"id": user.pk,
"pk": user.public_primary_key,
"name": user.name,
"username": user.username,
"avatar": user.avatar_url,
"avatar_full": user.avatar_full_url,
"important": important,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this important key here is really the only reason I modified this method to return typing.List[PagedUser] instead of the original typing.List[User] (obviously don't have the "important" context in the prior approach)

"teams": [{"pk": t.public_primary_key, "name": t.name} for t in user.teams.all()],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teams is needed here as Grafana Incident uses this for some of it's UI/UX

}
)
else:
# user was unpaged at some point, remove them
users = [u for u in users if u["pk"] != user_id]

return users

def _get_response_time(self):
"""Return response_time based on current alert group status."""
Expand Down
35 changes: 35 additions & 0 deletions engine/apps/alerts/models/alert_receive_channel.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,41 @@ def create(cls, **kwargs):
channel.save()
return channel

@classmethod
def get_orgs_direct_paging_integrations(
joeyorlando marked this conversation as resolved.
Show resolved Hide resolved
cls, organization: "Organization"
) -> models.QuerySet["AlertReceiveChannel"]:
return cls.objects.filter(
organization=organization,
integration=AlertReceiveChannel.INTEGRATION_DIRECT_PAGING,
)

@property
def is_contactable(self) -> bool:
"""
Returns true if:
- the integration has more than one channel filter associated with it
- the default channel filter has at least one notification method specified or an escalation chain associated with it
"""
if self.channel_filters.count() > 1:
return True

default_channel_filter = self.default_channel_filter
if not default_channel_filter:
return False

notify_via_slack = self.organization.slack_is_configured and default_channel_filter.notify_in_slack
notify_via_telegram = self.organization.telegram_is_configured and default_channel_filter.notify_in_telegram

notify_via_chatops = notify_via_slack or notify_via_telegram
custom_messaging_backend_configured = default_channel_filter.notification_backends is not None

return (
default_channel_filter.escalation_chain is not None
or notify_via_chatops
or custom_messaging_backend_configured
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed to be able to (optionally) filter down teams, in the GET /teams internal API endpoint (here), to just teams that have a "contactable" Direct Paging integration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this only makes sense for direct paging integrations? Should any other integrations is_contactable value be True if there is some schedule or user to be notified?

def delete(self):
self.deleted_at = timezone.now()
self.save()
Expand Down
Loading
Loading