-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add responders improvements #3128
Conversation
for schedule in schedules: | ||
# pass user list to list_users_to_notify_from_ical | ||
schedule_oncall_users = list_users_to_notify_from_ical(schedule, events_datetime=events_datetime) | ||
oncall_users.update({schedule.pk: schedule_oncall_users}) | ||
oncall_users.update({schedule: schedule_oncall_users}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's much easier here to just return the schedule object in case we need to reference other fields on it (like in the TeamSerializer
where we need to reference schedule.team
)
@@ -219,10 +221,24 @@ class UserView( | |||
"^username", | |||
"^slack_user_identity__cached_slack_login", | |||
"^slack_user_identity__cached_name", | |||
"^teams__name", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of the new requirements for the user search functionality in the Direct Paging dialog box will be to search for users based on the teams they currently belong to
engine/apps/alerts/paging.py
Outdated
def check_user_availability(user: User) -> typing.List[AvailabilityWarning]: | ||
"""Check user availability to be paged. | ||
|
||
Return a warnings list indicating `error` and any additional related `data`. | ||
""" | ||
warnings: typing.List[AvailabilityWarning] = [] | ||
if not user.notification_policies.exists(): | ||
warnings.append( | ||
{ | ||
"error": PagingError.USER_HAS_NO_NOTIFICATION_POLICY, | ||
"data": {}, | ||
} | ||
) | ||
|
||
is_on_call = False | ||
schedules = OnCallSchedule.objects.filter( | ||
Q(cached_ical_file_primary__contains=user.username) | Q(cached_ical_file_primary__contains=user.email), | ||
organization=user.organization, | ||
) | ||
schedules_data: ScheduleWarnings = {} | ||
for s in schedules: | ||
# keep track of schedules and on call users to suggest if needed | ||
oncall_users = list_users_to_notify_from_ical(s) | ||
schedules_data[s.name] = set(u.public_primary_key for u in oncall_users) | ||
if user in oncall_users: | ||
is_on_call = True | ||
break | ||
|
||
if not is_on_call: | ||
# user is not on-call | ||
# TODO: check working hours | ||
warnings.append( | ||
{ | ||
"error": PagingError.USER_IS_NOT_ON_CALL, | ||
"data": {"schedules": schedules_data}, | ||
} | ||
) | ||
|
||
return warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we no longer need this (nor the related API endpoint) since there is now an attribute on the User
model, is_currently_oncall
, that achieves this same functionality
"username": user.username, | ||
"avatar": user.avatar_url, | ||
"avatar_full": user.avatar_full_url, | ||
"important": important, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this important
key here is really the only reason I modified this method to return typing.List[PagedUser]
instead of the original typing.List[User]
(obviously don't have the "important" context in the prior approach)
@@ -242,8 +242,6 @@ def get_limited_alerts(self, obj): | |||
alerts = obj.alerts.order_by("-pk")[:100] | |||
return AlertSerializer(alerts, many=True).data | |||
|
|||
@extend_schema_field(UserShortSerializer(many=True)) | |||
@extend_schema_field(PagedUserSerializer(many=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change basically just adds the "important" field on these user objects (necessary for the new mockups/UX)
schedules = ScheduleReferenceSerializer(many=True, required=False, default=list) | ||
context: SerializerContext | ||
|
||
escalation_chain_id = serializers.CharField(required=False, default=None) | ||
escalation_chain = serializers.HiddenField(default=None) # set in DirectPagingSerializer.validate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- get rid of
schedules
+escalation_chain_id
(and by virtueescalation_chain
), these concepts are no longer relevant - introduce
team
@@ -68,7 +68,7 @@ def get_warnings(self, obj): | |||
|
|||
def get_on_call_now(self, obj): | |||
# Serializer context is set here: apps.api.views.schedule.ScheduleView.get_serializer_context | |||
users = self.context["oncall_users"].get(obj.pk, []) | |||
users = self.context["oncall_users"].get(obj, []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oncall_users
set on this context object now has User
objects as keys rather than the string pks
except DirectPagingUserTeamValidationError: | ||
raise BadRequest(detail=DirectPagingUserTeamValidationError.DETAIL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validates that the user specifies alteast a team or one user (can specify both of course, but both can't be null)
if is_currently_oncall_query_param == "true": | ||
# client explicitly wants to filter out users that are on-call | ||
queryset = queryset.filter(pk__in=_get_oncall_user_ids()) | ||
elif is_currently_oncall_query_param == "false": | ||
# user explicitly wants to filter out on-call users | ||
queryset = queryset.exclude(pk__in=_get_oncall_user_ids()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure whether behaviour like this is better suited to live in the FilterSet
class? I put it here because I was unsure how, within the UserFilter
class, to get access to self.schedules_with_oncall_users
(or at a minimum, self.request.user
)
@@ -168,6 +168,7 @@ function renderFormControl( | |||
} | |||
} | |||
|
|||
// TODO: make this generic to accept form data type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grafana/grafana-oncall-frontend is there an easy way to update a class component to be generic? (I'd prefer not to rewrite the component in this PR (as it makes use of React.Comonent.forceUpdate
))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean for the data
to provide it under <T>
or so - or some other params as well? Something like class GForm<T> extends React.Component<GFormProps<T>, {}> {
where T
will be the type for data.
I'm afraid for that is going to take some time until you figure out all the types passed from the other places such as schedules, maintenance and so on... Atm it's not very clear what their shape looks like. Feel free to open a tech debt/refactoring issue, we can work on it in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I tried making the component generic using the approach you proposed and it mostly works but it'd require a bit of refactoring on some parts of the existing component. I created this issue to refactor this in a separate PR 👍
(I'll go ahead and remove my TODO comment here)
grafana-plugin/src/components/ManualAlertGroup/ManualAlertGroup.tsx
Outdated
Show resolved
Hide resolved
"avatar": user.avatar_url, | ||
"avatar_full": user.avatar_full_url, | ||
"important": important, | ||
"teams": [{"pk": t.public_primary_key, "name": t.name} for t in user.teams.all()], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
teams
is needed here as Grafana Incident uses this for some of it's UI/UX
going to update the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes make sense to me, added a few questions/comments.
Also, wondering what the plan would be for the existing Teams implementation of the paging feature (yet to be enabled).
@classmethod | ||
def get_orgs_direct_paging_integrations( | ||
cls, organization: "Organization" | ||
) -> models.QuerySet["AlertReceiveChannel"]: | ||
return cls.objects.filter( | ||
organization=organization, | ||
integration=AlertReceiveChannel.INTEGRATION_DIRECT_PAGING, | ||
) | ||
|
||
@property | ||
def is_contactable(self) -> bool: | ||
""" | ||
Returns true if: | ||
- the integration has more than one channel filter associated with it | ||
- the default channel filter has at least one notification method specified or an escalation chain associated with it | ||
""" | ||
if self.channel_filters.count() > 1: | ||
return True | ||
|
||
default_channel_filter = self.default_channel_filter | ||
if not default_channel_filter: | ||
return False | ||
|
||
notify_via_slack = self.organization.slack_is_configured and default_channel_filter.notify_in_slack | ||
notify_via_telegram = self.organization.telegram_is_configured and default_channel_filter.notify_in_telegram | ||
|
||
notify_via_chatops = notify_via_slack or notify_via_telegram | ||
custom_messaging_backend_configured = default_channel_filter.notification_backends is not None | ||
|
||
return ( | ||
default_channel_filter.escalation_chain is not None | ||
or notify_via_chatops | ||
or custom_messaging_backend_configured | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this only makes sense for direct paging integrations? Should any other integrations is_contactable
value be True if there is some schedule or user to be notified?
@matiasb it's a new concept that, for now, is only applicable for direct paging integrations, correct. If you are concerned that this might cause confusion, I could move it instead as a utility method in |
grafana-plugin/src/components/ManualAlertGroup/ManualAlertGroup.tsx
Outdated
Show resolved
Hide resolved
grafana-plugin/src/components/ManualAlertGroup/ManualAlertGroup.tsx
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks 👍
# Which issue(s) this PR fixes Closes grafana/support-escalations#8143 Fix a few minor issues introduced in #3128: - Fix slow `GET /users` internal API endpoint related to [this change](https://github.com/grafana/oncall/blob/dev/engine/apps/api/views/user.py#L239) - Fix slow `GET /teams` internal API endpoint. Introduced a `short` query parameter that only invokes `apps.schedules.ical_utils.get_oncall_users_for_multiple_schedules` when `short=false`. - Order results from `GET /teams` internal API endpoint by name (ascending) - Fix search issue when searching for teams in the add responders popup window (this was strictly a frontend issue) - CSS changes to add responders dropdown to fix lonnnggg results list: **Before** <img width="377" alt="Screenshot 2023-10-31 at 10 06 20" src="https://github.com/grafana/oncall/assets/9406895/246c7c3b-7bea-44a1-afec-a38144c2c2d1"> **After** <img width="444" alt="Screenshot 2023-10-31 at 10 48 12" src="https://github.com/grafana/oncall/assets/9406895/b5602a22-c691-4dc7-bd3d-e4d6b76d04a0"> ## Still todo The `apps.schedules.ical_utils.get_oncall_users_for_multiple_schedules` method is still very slow when an instance has a lot of users (ex. `ops`). Ideally we should refactor this method to be more efficient because we still need to call this method under some circumstances. Ex. to populate this dropdown when Direct Paging a user (note that it didn't finish loading here on `ops`): <img width="1037" alt="Screenshot 2023-10-30 at 18 14 59" src="https://github.com/grafana/oncall/assets/9406895/9d91a57c-5db8-4ff9-862a-cd3755f52690"> ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)
# What this PR does https://www.loom.com/share/c5e10b5ec51343d0954c6f41cfd6a5fb ## Summary of backend changes - Add `AlertReceiveChannel.get_orgs_direct_paging_integrations` method and `AlertReceiveChannel.is_contactable` property. These are needed to be able to (optionally) filter down teams, in the `GET /teams` internal API endpoint ([here](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R63)), to just teams that have a "contactable" Direct Paging integration - `engine/apps/alerts/paging.py` - update these functions to support new UX. In short `direct_paging` no longer takes a list of `ScheduleNotifications` or an `EscalationChain` object - add `user_is_oncall` helper function - add `_construct_title` helper function. In short if no `title` is provided, which is the case for Direct Pages originating from OnCall (either UI or Slack), then the format is `f"{from_user.username} is paging <team.name (if team is specified> <comma separated list of user.usernames> to join escalation"` - `engine/apps/api/serializers/team.py` - add `number_of_users_currently_oncall` attribute to response schema ([code](https://github.com/grafana/oncall/pull/3128/files#diff-26af48f796c9e987a76447586dd0f92349783d6ea6a0b6039a2f0f28bd58c2ebR45-R52)) - `engine/apps/api/serializers/user.py` - add `is_currently_oncall` attribute to response schema ([code](https://github.com/grafana/oncall/pull/3128/files#diff-6744b5544ebb120437af98a996da5ad7d48ee1139a6112c7e3904010ab98f232R157-R162)) - `engine/apps/api/views/team.py` - add support for two new optional query params `only_include_notifiable_teams` and `include_no_team` ([code](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R70)) - `engine/apps/api/views/user.py` - in the `GET /users` internal API endpoint, when specifying the `search` query param now also search on `teams__name` ([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R223); this is a new UX requirement) - add support for a new optional query param, `is_currently_oncall`, to allow filtering users based on.. whether they are currently on call or not ([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R272-R282)) - remove `check_availability` endpoint (no longer used with new UX; also removed references in frontend code) - `engine/apps/slack/scenarios/paging.py` and `engine/apps/slack/scenarios/manage_responders.py` - update Slack workflows to support new UX. Schedules are no longer a concept here. When creating a new alert group via `/escalate` the user either specifies a team and/or user(s) (they must specify at least one of the two and validation is done here to check this). When adding responders to an existing alert group it's simply a list of users that they can add, no more schedules. - add `Organization.slack_is_configured` and `Organization.telegram_is_configured` properties. These are needed to support [this new functionality ](https://github.com/grafana/oncall/pull/3128/files#diff-9d96504027309f2bd1e95352bac1433b09b60eb4fafb611b52a6c15ed16cbc48R271-R272) in the `AlertReceiveChannel` model. ## Summary of frontend changes - Refactor/rename `EscalationVariants` component to `AddResponders` + remove `grafana-plugin/src/containers/UserWarningModal` (no longer needed with new UX) - Remove `grafana-plugin/src/models/user.ts` as it seemed to be a duplicate of `grafana-plugin/src/models/user/user.types.ts` Related to grafana/incident#4278 - Closes #3115 - Closes #3116 - Closes #3117 - Closes #3118 - Closes #3177 ## TODO - [x] make frontend changes - [x] update Slack backend functionality - [x] update public documentation - [x] add/update e2e tests ## Post-deploy To-dos - [ ] update dev/ops/production Slack bots to update `/escalate` command description (should now say "Direct page a team or user(s)") ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)
# Which issue(s) this PR fixes Closes grafana/support-escalations#8143 Fix a few minor issues introduced in #3128: - Fix slow `GET /users` internal API endpoint related to [this change](https://github.com/grafana/oncall/blob/dev/engine/apps/api/views/user.py#L239) - Fix slow `GET /teams` internal API endpoint. Introduced a `short` query parameter that only invokes `apps.schedules.ical_utils.get_oncall_users_for_multiple_schedules` when `short=false`. - Order results from `GET /teams` internal API endpoint by name (ascending) - Fix search issue when searching for teams in the add responders popup window (this was strictly a frontend issue) - CSS changes to add responders dropdown to fix lonnnggg results list: **Before** <img width="377" alt="Screenshot 2023-10-31 at 10 06 20" src="https://github.com/grafana/oncall/assets/9406895/246c7c3b-7bea-44a1-afec-a38144c2c2d1"> **After** <img width="444" alt="Screenshot 2023-10-31 at 10 48 12" src="https://github.com/grafana/oncall/assets/9406895/b5602a22-c691-4dc7-bd3d-e4d6b76d04a0"> ## Still todo The `apps.schedules.ical_utils.get_oncall_users_for_multiple_schedules` method is still very slow when an instance has a lot of users (ex. `ops`). Ideally we should refactor this method to be more efficient because we still need to call this method under some circumstances. Ex. to populate this dropdown when Direct Paging a user (note that it didn't finish loading here on `ops`): <img width="1037" alt="Screenshot 2023-10-30 at 18 14 59" src="https://github.com/grafana/oncall/assets/9406895/9d91a57c-5db8-4ff9-862a-cd3755f52690"> ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)
What this PR does
https://www.loom.com/share/c5e10b5ec51343d0954c6f41cfd6a5fb
Summary of backend changes
AlertReceiveChannel.get_orgs_direct_paging_integrations
method andAlertReceiveChannel.is_contactable
property. These are needed to be able to (optionally) filter down teams, in theGET /teams
internal API endpoint (here), to just teams that have a "contactable" Direct Paging integrationengine/apps/alerts/paging.py
direct_paging
no longer takes a list ofScheduleNotifications
or anEscalationChain
objectuser_is_oncall
helper function_construct_title
helper function. In short if notitle
is provided, which is the case for Direct Pages originating from OnCall (either UI or Slack), then the format isf"{from_user.username} is paging <team.name (if team is specified> <comma separated list of user.usernames> to join escalation"
engine/apps/api/serializers/team.py
- addnumber_of_users_currently_oncall
attribute to response schema (code)engine/apps/api/serializers/user.py
- addis_currently_oncall
attribute to response schema (code)engine/apps/api/views/team.py
- add support for two new optional query paramsonly_include_notifiable_teams
andinclude_no_team
(code)engine/apps/api/views/user.py
GET /users
internal API endpoint, when specifying thesearch
query param now also search onteams__name
(code; this is a new UX requirement)is_currently_oncall
, to allow filtering users based on.. whether they are currently on call or not (code)check_availability
endpoint (no longer used with new UX; also removed references in frontend code)engine/apps/slack/scenarios/paging.py
andengine/apps/slack/scenarios/manage_responders.py
- update Slack workflows to support new UX. Schedules are no longer a concept here. When creating a new alert group via/escalate
the user either specifies a team and/or user(s) (they must specify at least one of the two and validation is done here to check this). When adding responders to an existing alert group it's simply a list of users that they can add, no more schedules.Organization.slack_is_configured
andOrganization.telegram_is_configured
properties. These are needed to support this new functionality in theAlertReceiveChannel
model.Summary of frontend changes
EscalationVariants
component toAddResponders
+ removegrafana-plugin/src/containers/UserWarningModal
(no longer needed with new UX)grafana-plugin/src/models/user.ts
as it seemed to be a duplicate ofgrafana-plugin/src/models/user/user.types.ts
Related to https://github.com/grafana/incident/issues/4278
TODO
Post-deploy To-dos
/escalate
command description (should now say "Direct page a team or user(s)")Checklist
pr:no public docs
PR label added if not required)CHANGELOG.md
updated (orpr:no changelog
PR label added if not required)