Add responders improvements (#3128)

# What this PR does https://www.loom.com/share/c5e10b5ec51343d0954c6f41cfd6a5fb ## Summary of backend changes - Add `AlertReceiveChannel.get_orgs_direct_paging_integrations` method and `AlertReceiveChannel.is_contactable` property. These are needed to be able to (optionally) filter down teams, in the `GET /teams` internal API endpoint ([here](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R63)), to just teams that have a "contactable" Direct Paging integration - `engine/apps/alerts/paging.py` - update these functions to support new UX. In short `direct_paging` no longer takes a list of `ScheduleNotifications` or an `EscalationChain` object - add `user_is_oncall` helper function - add `_construct_title` helper function. In short if no `title` is provided, which is the case for Direct Pages originating from OnCall (either UI or Slack), then the format is `f"{from_user.username} is paging <team.name (if team is specified> <comma separated list of user.usernames> to join escalation"` - `engine/apps/api/serializers/team.py` - add `number_of_users_currently_oncall` attribute to response schema ([code](https://github.com/grafana/oncall/pull/3128/files#diff-26af48f796c9e987a76447586dd0f92349783d6ea6a0b6039a2f0f28bd58c2ebR45-R52)) - `engine/apps/api/serializers/user.py` - add `is_currently_oncall` attribute to response schema ([code](https://github.com/grafana/oncall/pull/3128/files#diff-6744b5544ebb120437af98a996da5ad7d48ee1139a6112c7e3904010ab98f232R157-R162)) - `engine/apps/api/views/team.py` - add support for two new optional query params `only_include_notifiable_teams` and `include_no_team` ([code](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R70)) - `engine/apps/api/views/user.py` - in the `GET /users` internal API endpoint, when specifying the `search` query param now also search on `teams__name` ([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R223); this is a new UX requirement) - add support for a new optional query param, `is_currently_oncall`, to allow filtering users based on.. whether they are currently on call or not ([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R272-R282)) - remove `check_availability` endpoint (no longer used with new UX; also removed references in frontend code) - `engine/apps/slack/scenarios/paging.py` and `engine/apps/slack/scenarios/manage_responders.py` - update Slack workflows to support new UX. Schedules are no longer a concept here. When creating a new alert group via `/escalate` the user either specifies a team and/or user(s) (they must specify at least one of the two and validation is done here to check this). When adding responders to an existing alert group it's simply a list of users that they can add, no more schedules. - add `Organization.slack_is_configured` and `Organization.telegram_is_configured` properties. These are needed to support [this new functionality ](https://github.com/grafana/oncall/pull/3128/files#diff-9d96504027309f2bd1e95352bac1433b09b60eb4fafb611b52a6c15ed16cbc48R271-R272) in the `AlertReceiveChannel` model. ## Summary of frontend changes - Refactor/rename `EscalationVariants` component to `AddResponders` + remove `grafana-plugin/src/containers/UserWarningModal` (no longer needed with new UX) - Remove `grafana-plugin/src/models/user.ts` as it seemed to be a duplicate of `grafana-plugin/src/models/user/user.types.ts` Related to grafana/incident#4278 - Closes #3115 - Closes #3116 - Closes #3117 - Closes #3118 - Closes #3177 ## TODO - [x] make frontend changes - [x] update Slack backend functionality - [x] update public documentation - [x] add/update e2e tests ## Post-deploy To-dos - [ ] update dev/ops/production Slack bots to update `/escalate` command description (should now say "Direct page a team or user(s)") ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)
grafana · Oct 27, 2023 · 697248d · 697248d
1 parent 11259de
commit 697248d
Show file tree

Hide file tree

Showing 80 changed files with 4,250 additions and 2,647 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,13 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## Unreleased
+
+### Changed
+
+- Simplify Direct Paging workflow. Now when using Direct Paging you either simply specify a team, or one or more users
+  to page by @joeyorlando ([#3128](https://github.com/grafana/oncall/pull/3128))
+
 ## v1.3.47 (2023-10-25)
 
 ### Fixed

diff --git a/docs/sources/integrations/manual/index.md b/docs/sources/integrations/manual/index.md
@@ -22,58 +22,53 @@ However, sometimes you might need to page a [team][manage-teams] or request assi
 are not part of these pre-defined rules.
 
 For such ad-hoc scenarios, Grafana OnCall allows you to create an alert group, input necessary information, and decide
-who will be alerted – a team, a user, or an on-call user from a specific schedule.
+who will be alerted – a team, or a set of users.
 
 ## Page a team
 
-Click on **+ New alert group** on the **Alert groups** page to start creating a new alert group.
-From there, you can configure the alert group to notify a particular team and optionally include additional users or
-schedules. Here are the inputs you need to fill in:
+Click on **+ Escalation** on the **Alert groups** page to start creating a new alert group.
+From there, you can configure the alert group to notify a particular team and optionally include additional users. Here are the inputs you need to fill in:
 
-- **Title**: Write a brief and clear title for your alert group.
-- **Message**: Optionally, add a message to provide more details or instructions.
+- **Message**: Write a message to provide more details or instructions to those whom you are paging.
 - **Team**: Select the team you want to page. The team's
-[direct paging integration](#learn-the-flow-and-handle-warnings) will be used for notification.
-- **Additional Responders**: Optionally, include more responders for the alert group.
-These could be any combination of users and schedules.
-For each additional responder (user or schedule), you can select a notification policy: [default or important][notify].
+  [direct paging integration](#learn-the-flow-and-handle-warnings) will be used for notification. _Note_ that you will only
+  see teams that have a "contactable" direct paging integration (ie. it has an escalation chain assigned to it, or has
+  at least one Chatops integration connected to send notifications to).
+- **Users**: Include more users to the alert group. For each additional user, you can select a notification policy:
+  [default or important][notify].
 
 > The same feature is also available as [**/escalate**][slack-escalate] Slack command.
 
-## Add responders for an existing alert group
+## Add users to an existing alert group
 
-If you want to page more people for an existing alert group, you can do so using the **Notify additional responders**
-button on the specific alert group's page. Here you can select more users, or choose users who are on-call for specific
-schedules. The same functionality is available in Slack using the **Responders** button in the alert group's message.
+If you want to page more people for an existing alert group, you can do so using the **+ Add**
+button, within the "Participants" section on the specific alert group's page. The same functionality is available in
+Slack using the **Responders** button in the alert group's message.
 
-Notifying additional responders doesn't disrupt or interfere with the escalation chain configured for the alert group;
-it simply adds more responders and notifies them immediately. Note that adding responders for an existing alert group
+Notifying additional users doesn't disrupt or interfere with the escalation chain configured for the alert group;
+it simply adds more responders and notifies them immediately. Note that adding users for an existing alert group
 will page them even if the alert group is silenced or acknowledged, but not if the alert group is resolved.
 
 > It's not possible to page a team for an existing alert group. To page a specific team, you need to
-[create a new alert group](#page-a-team).
+> [create a new alert group](#page-a-team).
 
 ## Learn the flow and handle warnings
 
 When you pick a team to page, Grafana OnCall will automatically use the right direct paging integration for the team.
 "Direct paging" is a special kind of integration in Grafana OnCall that is unique per team and is used to send alerts
 to the team's ChatOps channels and start an appropriate escalation chain.
 
-If a team hasn't set up a direct paging integration, or if the integration doesn't have any escalation chains connected,
-Grafana OnCall will issue a warning. If this happens, consider
-[setting up a direct paging integration](#set-up-direct-paging-for-a-team) for the team
-(or reach out to the relevant team and suggest doing so).
-
 ## Set up direct paging for a team
 
-To create a direct paging integration for a team, click **+ New alert group** on the **Alert groups** page, choose the team,
-and create an alert group, **regardless of any warnings**. This action automatically triggers Grafana OnCall to generate
-a [direct paging integration](#learn-the-flow-and-handle-warnings) for the chosen team. Alternatively, navigate to
-the **Integrations** page and create a new integration with type "Direct paging" from there, assigning it to the team.
+By default all teams will have a direct paging integration created for them. However, these are not configured by default.
+If a team does not have their direct paging integration configured, such that it is "contactable" (ie. it has an
+escalation chain assigned to it, or has at least one Chatops integration connected to send notifications to), you will
+not be able to direct page this team. If this happens, consider following the following steps for the team (or reach out
+to the relevant team and suggest doing so).
 
-After setting up the integration, you can customize its settings, link it to an escalation chain,
-and configure associated ChatOps channels.
-To confirm that the integration is functioning as intended, [create a new alert group](#page-a-team)
+Navigate to the **Integrations** page and find the "Direct paging" integration for the team in question. From the
+integration's detail page, you can customize its settings, link it to an escalation chain, and configure associated
+ChatOps channels. To confirm that the integration is functioning as intended, [create a new alert group](#page-a-team)
 and select the same team for a test run.
 
 {{% docs/reference %}}

diff --git a/docs/sources/open-source/_index.md b/docs/sources/open-source/_index.md
@@ -96,7 +96,7 @@ features:
       should_escape: false
     - command: /escalate
       url: <ONCALL_ENGINE_PUBLIC_URL>/slack/interactive_api_endpoint/
-      description: Direct page user(s) or schedule(s)
+      description: Direct page a team or user(s)
       should_escape: false
 oauth_config:
   redirect_urls:

diff --git a/engine/apps/alerts/models/alert_group.py b/engine/apps/alerts/models/alert_group.py
@@ -70,6 +70,16 @@ class LogRecordUser(typing.TypedDict):
     avatar_full: str
 
 
+class PagedUser(typing.TypedDict):
+    id: int
+    username: str
+    name: str
+    pk: str
+    avatar: str
+    avatar_full: str
+    important: bool
+
+
 class LogRecords(typing.TypedDict):
     time: str  # humanized delta relative to now
     action: str  # human-friendly description
@@ -509,22 +519,57 @@ def declare_incident_link(self) -> str:
     def happened_while_maintenance(self):
         return self.root_alert_group is not None and self.root_alert_group.maintenance_uuid is not None
 
-    def get_paged_users(self) -> QuerySet[User]:
+    def get_paged_users(self) -> typing.List[PagedUser]:
         from apps.alerts.models import AlertGroupLogRecord
 
-        users_ids = set()
-        for log_record in self.log_records.filter(
+        user_ids: typing.Set[str] = set()
+        users: typing.List[PagedUser] = []
+
+        log_records = self.log_records.filter(
             type__in=(AlertGroupLogRecord.TYPE_DIRECT_PAGING, AlertGroupLogRecord.TYPE_UNPAGE_USER)
-        ):
+        )
+
+        for log_record in log_records:
             # filter paging events, track still active escalations
             info = log_record.get_step_specific_info()
             user_id = info.get("user") if info else None
+            important = info.get("important") if info else None
+
             if user_id is not None:
-                users_ids.add(
+                user_ids.add(
                     user_id
-                ) if log_record.type == AlertGroupLogRecord.TYPE_DIRECT_PAGING else users_ids.discard(user_id)
+                ) if log_record.type == AlertGroupLogRecord.TYPE_DIRECT_PAGING else user_ids.discard(user_id)
+
+        user_instances = User.objects.filter(public_primary_key__in=user_ids)
+        user_map = {u.public_primary_key: u for u in user_instances}
 
-        return User.objects.filter(public_primary_key__in=users_ids)
+        # mostly doing this second loop to avoid having to query each user individually in the first loop
+        for log_record in log_records:
+            # filter paging events, track still active escalations
+            info = log_record.get_step_specific_info()
+            user_id = info.get("user") if info else None
+            important = info.get("important") if info else False
+
+            if user_id is not None and (user := user_map.get(user_id)) is not None:
+                if log_record.type == AlertGroupLogRecord.TYPE_DIRECT_PAGING:
+                    # add the user
+                    users.append(
+                        {
+                            "id": user.pk,
+                            "pk": user.public_primary_key,
+                            "name": user.name,
+                            "username": user.username,
+                            "avatar": user.avatar_url,
+                            "avatar_full": user.avatar_full_url,
+                            "important": important,
+                            "teams": [{"pk": t.public_primary_key, "name": t.name} for t in user.teams.all()],
+                        }
+                    )
+                else:
+                    # user was unpaged at some point, remove them
+                    users = [u for u in users if u["pk"] != user_id]
+
+        return users
 
     def _get_response_time(self):
         """Return response_time based on current alert group status."""