Skip to content

Commit

Permalink
Add responders improvements (#3128)
Browse files Browse the repository at this point in the history
# What this PR does

https://www.loom.com/share/c5e10b5ec51343d0954c6f41cfd6a5fb

## Summary of backend changes
- Add `AlertReceiveChannel.get_orgs_direct_paging_integrations` method
and `AlertReceiveChannel.is_contactable` property. These are needed to
be able to (optionally) filter down teams, in the `GET /teams` internal
API endpoint
([here](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R63)),
to just teams that have a "contactable" Direct Paging integration
- `engine/apps/alerts/paging.py`
- update these functions to support new UX. In short `direct_paging` no
longer takes a list of `ScheduleNotifications` or an `EscalationChain`
object
  - add `user_is_oncall` helper function
- add `_construct_title` helper function. In short if no `title` is
provided, which is the case for Direct Pages originating from OnCall
(either UI or Slack), then the format is `f"{from_user.username} is
paging <team.name (if team is specified> <comma separated list of
user.usernames> to join escalation"`
- `engine/apps/api/serializers/team.py` - add
`number_of_users_currently_oncall` attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-26af48f796c9e987a76447586dd0f92349783d6ea6a0b6039a2f0f28bd58c2ebR45-R52))
- `engine/apps/api/serializers/user.py` - add `is_currently_oncall`
attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-6744b5544ebb120437af98a996da5ad7d48ee1139a6112c7e3904010ab98f232R157-R162))
- `engine/apps/api/views/team.py` - add support for two new optional
query params `only_include_notifiable_teams` and `include_no_team`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R70))
- `engine/apps/api/views/user.py`
- in the `GET /users` internal API endpoint, when specifying the
`search` query param now also search on `teams__name`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R223);
this is a new UX requirement)
- add support for a new optional query param, `is_currently_oncall`, to
allow filtering users based on.. whether they are currently on call or
not
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R272-R282))
- remove `check_availability` endpoint (no longer used with new UX; also
removed references in frontend code)
- `engine/apps/slack/scenarios/paging.py` and
`engine/apps/slack/scenarios/manage_responders.py` - update Slack
workflows to support new UX. Schedules are no longer a concept here.
When creating a new alert group via `/escalate` the user either
specifies a team and/or user(s) (they must specify at least one of the
two and validation is done here to check this). When adding responders
to an existing alert group it's simply a list of users that they can
add, no more schedules.
- add `Organization.slack_is_configured` and
`Organization.telegram_is_configured` properties. These are needed to
support [this new functionality
](https://github.com/grafana/oncall/pull/3128/files#diff-9d96504027309f2bd1e95352bac1433b09b60eb4fafb611b52a6c15ed16cbc48R271-R272)
in the `AlertReceiveChannel` model.

## Summary of frontend changes
- Refactor/rename `EscalationVariants` component to `AddResponders` +
remove `grafana-plugin/src/containers/UserWarningModal` (no longer
needed with new UX)
- Remove `grafana-plugin/src/models/user.ts` as it seemed to be a
duplicate of `grafana-plugin/src/models/user/user.types.ts`

Related to grafana/incident#4278

- Closes #3115
- Closes #3116
- Closes #3117
- Closes #3118 
- Closes #3177 

## TODO
- [x] make frontend changes
- [x] update Slack backend functionality
- [x] update public documentation
- [x] add/update e2e tests

## Post-deploy To-dos
- [ ] update dev/ops/production Slack bots to update `/escalate` command
description (should now say "Direct page a team or user(s)")

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
  • Loading branch information
joeyorlando authored Oct 27, 2023
1 parent 11259de commit 697248d
Show file tree
Hide file tree
Showing 80 changed files with 4,250 additions and 2,647 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased

### Changed

- Simplify Direct Paging workflow. Now when using Direct Paging you either simply specify a team, or one or more users
to page by @joeyorlando ([#3128](https://github.com/grafana/oncall/pull/3128))

## v1.3.47 (2023-10-25)

### Fixed
Expand Down
53 changes: 24 additions & 29 deletions docs/sources/integrations/manual/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,58 +22,53 @@ However, sometimes you might need to page a [team][manage-teams] or request assi
are not part of these pre-defined rules.

For such ad-hoc scenarios, Grafana OnCall allows you to create an alert group, input necessary information, and decide
who will be alerted – a team, a user, or an on-call user from a specific schedule.
who will be alerted – a team, or a set of users.

## Page a team

Click on **+ New alert group** on the **Alert groups** page to start creating a new alert group.
From there, you can configure the alert group to notify a particular team and optionally include additional users or
schedules. Here are the inputs you need to fill in:
Click on **+ Escalation** on the **Alert groups** page to start creating a new alert group.
From there, you can configure the alert group to notify a particular team and optionally include additional users. Here are the inputs you need to fill in:

- **Title**: Write a brief and clear title for your alert group.
- **Message**: Optionally, add a message to provide more details or instructions.
- **Message**: Write a message to provide more details or instructions to those whom you are paging.
- **Team**: Select the team you want to page. The team's
[direct paging integration](#learn-the-flow-and-handle-warnings) will be used for notification.
- **Additional Responders**: Optionally, include more responders for the alert group.
These could be any combination of users and schedules.
For each additional responder (user or schedule), you can select a notification policy: [default or important][notify].
[direct paging integration](#learn-the-flow-and-handle-warnings) will be used for notification. _Note_ that you will only
see teams that have a "contactable" direct paging integration (ie. it has an escalation chain assigned to it, or has
at least one Chatops integration connected to send notifications to).
- **Users**: Include more users to the alert group. For each additional user, you can select a notification policy:
[default or important][notify].

> The same feature is also available as [**/escalate**][slack-escalate] Slack command.
## Add responders for an existing alert group
## Add users to an existing alert group

If you want to page more people for an existing alert group, you can do so using the **Notify additional responders**
button on the specific alert group's page. Here you can select more users, or choose users who are on-call for specific
schedules. The same functionality is available in Slack using the **Responders** button in the alert group's message.
If you want to page more people for an existing alert group, you can do so using the **+ Add**
button, within the "Participants" section on the specific alert group's page. The same functionality is available in
Slack using the **Responders** button in the alert group's message.

Notifying additional responders doesn't disrupt or interfere with the escalation chain configured for the alert group;
it simply adds more responders and notifies them immediately. Note that adding responders for an existing alert group
Notifying additional users doesn't disrupt or interfere with the escalation chain configured for the alert group;
it simply adds more responders and notifies them immediately. Note that adding users for an existing alert group
will page them even if the alert group is silenced or acknowledged, but not if the alert group is resolved.

> It's not possible to page a team for an existing alert group. To page a specific team, you need to
[create a new alert group](#page-a-team).
> [create a new alert group](#page-a-team).
## Learn the flow and handle warnings

When you pick a team to page, Grafana OnCall will automatically use the right direct paging integration for the team.
"Direct paging" is a special kind of integration in Grafana OnCall that is unique per team and is used to send alerts
to the team's ChatOps channels and start an appropriate escalation chain.

If a team hasn't set up a direct paging integration, or if the integration doesn't have any escalation chains connected,
Grafana OnCall will issue a warning. If this happens, consider
[setting up a direct paging integration](#set-up-direct-paging-for-a-team) for the team
(or reach out to the relevant team and suggest doing so).

## Set up direct paging for a team

To create a direct paging integration for a team, click **+ New alert group** on the **Alert groups** page, choose the team,
and create an alert group, **regardless of any warnings**. This action automatically triggers Grafana OnCall to generate
a [direct paging integration](#learn-the-flow-and-handle-warnings) for the chosen team. Alternatively, navigate to
the **Integrations** page and create a new integration with type "Direct paging" from there, assigning it to the team.
By default all teams will have a direct paging integration created for them. However, these are not configured by default.
If a team does not have their direct paging integration configured, such that it is "contactable" (ie. it has an
escalation chain assigned to it, or has at least one Chatops integration connected to send notifications to), you will
not be able to direct page this team. If this happens, consider following the following steps for the team (or reach out
to the relevant team and suggest doing so).

After setting up the integration, you can customize its settings, link it to an escalation chain,
and configure associated ChatOps channels.
To confirm that the integration is functioning as intended, [create a new alert group](#page-a-team)
Navigate to the **Integrations** page and find the "Direct paging" integration for the team in question. From the
integration's detail page, you can customize its settings, link it to an escalation chain, and configure associated
ChatOps channels. To confirm that the integration is functioning as intended, [create a new alert group](#page-a-team)
and select the same team for a test run.

{{% docs/reference %}}
Expand Down
2 changes: 1 addition & 1 deletion docs/sources/open-source/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ features:
should_escape: false
- command: /escalate
url: <ONCALL_ENGINE_PUBLIC_URL>/slack/interactive_api_endpoint/
description: Direct page user(s) or schedule(s)
description: Direct page a team or user(s)
should_escape: false
oauth_config:
redirect_urls:
Expand Down
59 changes: 52 additions & 7 deletions engine/apps/alerts/models/alert_group.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,16 @@ class LogRecordUser(typing.TypedDict):
avatar_full: str


class PagedUser(typing.TypedDict):
id: int
username: str
name: str
pk: str
avatar: str
avatar_full: str
important: bool


class LogRecords(typing.TypedDict):
time: str # humanized delta relative to now
action: str # human-friendly description
Expand Down Expand Up @@ -509,22 +519,57 @@ def declare_incident_link(self) -> str:
def happened_while_maintenance(self):
return self.root_alert_group is not None and self.root_alert_group.maintenance_uuid is not None

def get_paged_users(self) -> QuerySet[User]:
def get_paged_users(self) -> typing.List[PagedUser]:
from apps.alerts.models import AlertGroupLogRecord

users_ids = set()
for log_record in self.log_records.filter(
user_ids: typing.Set[str] = set()
users: typing.List[PagedUser] = []

log_records = self.log_records.filter(
type__in=(AlertGroupLogRecord.TYPE_DIRECT_PAGING, AlertGroupLogRecord.TYPE_UNPAGE_USER)
):
)

for log_record in log_records:
# filter paging events, track still active escalations
info = log_record.get_step_specific_info()
user_id = info.get("user") if info else None
important = info.get("important") if info else None

if user_id is not None:
users_ids.add(
user_ids.add(
user_id
) if log_record.type == AlertGroupLogRecord.TYPE_DIRECT_PAGING else users_ids.discard(user_id)
) if log_record.type == AlertGroupLogRecord.TYPE_DIRECT_PAGING else user_ids.discard(user_id)

user_instances = User.objects.filter(public_primary_key__in=user_ids)
user_map = {u.public_primary_key: u for u in user_instances}

return User.objects.filter(public_primary_key__in=users_ids)
# mostly doing this second loop to avoid having to query each user individually in the first loop
for log_record in log_records:
# filter paging events, track still active escalations
info = log_record.get_step_specific_info()
user_id = info.get("user") if info else None
important = info.get("important") if info else False

if user_id is not None and (user := user_map.get(user_id)) is not None:
if log_record.type == AlertGroupLogRecord.TYPE_DIRECT_PAGING:
# add the user
users.append(
{
"id": user.pk,
"pk": user.public_primary_key,
"name": user.name,
"username": user.username,
"avatar": user.avatar_url,
"avatar_full": user.avatar_full_url,
"important": important,
"teams": [{"pk": t.public_primary_key, "name": t.name} for t in user.teams.all()],
}
)
else:
# user was unpaged at some point, remove them
users = [u for u in users if u["pk"] != user_id]

return users

def _get_response_time(self):
"""Return response_time based on current alert group status."""
Expand Down
Loading

0 comments on commit 697248d

Please sign in to comment.