-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: initial alerting support #1420
Conversation
To modernize the code & API a bit, I've also performed the following actions for the new CR:
If there are reasons why these paradigms were not adopted in the first place, please let me know and I'll get this in line with other resources |
794822b
to
a4a7ea4
Compare
Waiting for grafana/grafana-openapi-client-go#75 to be merged before removing the draft status from the PR |
c68a5f4
to
797c86d
Compare
@theSuess I'm currently looking through everything, but it's not a small PR, so it takes some time ;). RBACHere our CI is a bit lacking, but there is no magic to get new RBAC rules in to kustomize or helm. Sadly, you will have to copy that yaml manually. TestsAs you have seen in the code we haven't written many tests, so I can definitely live with merging this PR without having a bunch of them, even though it of course would be ideal to add some. But it would be nice to at least add some e2e test for alerts. This way we have some kind of basic knowledge that it's working. CommentAbout About |
Will it be able to provision both grafana managed alerts and datasource managed alers ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, but in general I think it looks great.
I started to play around with the alertgroup and I managed to create the following issue.
Initially I thought this was related to the error I saw In grafana about the folder missing, which isn't great and we probably need some kind of check for that as well. logger=ngalert t=2024-02-19T14:27:00.00159844Z level=error msg="failed to fetch alert rule namespace" err="folder not found" uid=4843de5c-4f8a-4af0-9509-23526a04faf8 org=1 namespace_uid=f9b0a98d-2ed3-45a6-9521-18679c74d4f1 I don't have any more time to look into why this error happens but it could also be something around datasources, it seems like Alert actually want to try to reach the datasource instead of just saying that it's there like a dashboard can live with. Another thing I noticed was that we should probably bump our default grafana instance to 10 before releasing this feature. Another issue I faced was that I couldn't create a folder with a specific UID. Which created issues when I wanted to create the alert using the operator. If anyone else will be playing around with folders I recommend these two curl commands. # List all folders
curl -X GET -H Accept:application/json -H Content-Type:application/json -H "Authorization: Bearer SOOO_LONG_TOKEN" http://localhost:3000/api/folders
# Create a folder with a specific UID
curl -X POST -H Accept:application/json -H Content-Type:application/json -H "Authorization: Bearer SOOO_LONG_TOKEN" http://localhost:3000/api/folders -d '{"uid":"f9b0a98d-2ed3-45a6-9521-18679c74d4f1", "title":"foo"}' We might need to solve the folder creation similarly as we did in dashboards, to make sure we didn't get any strange errors. Or we need to update grafanaFolder to take UID as an option so we can set it there. |
This happens when we try to update the status on a resource which has been changed by another client. It might be caused by adding the finalizer. I'll build a fix for this Regarding folders: what's your take on adding a Adding a UID to the folder spec would work as well. I don't think it makes sense to dynamically create folders for alert rule groups as multiple alert rule groups can be stored in the same folder which would require further matching logic |
8e28b22
to
ce00ec2
Compare
Alright, most issues ironed out. We now support a The only issue is that there is no way to enforce I've also added an e2e test & fixed the instrumentedroundtripper |
@theSuess could an option be to use validation rules instead of an adminsion webhook? |
oooh that's interesting - will see if I can get this working with kubebuilder |
Validation rules seem to work! Be careful with them though, if you get them wrong while having existing resources, it can be hard to get rid of them |
@lsoica this will only support grafana managed alerts. Datasource managed alerts are much more complex and would require a direct connection to the datasource which is out of scope for the operator. |
Regarding the folder selector: maybe we can simplify this to just be a Thoughts? |
Agree, it sounds very resonable. |
Switched over to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
return ctrl.Result{}, nil | ||
} | ||
controllerLog.Error(err, "error getting grafana folder cr") | ||
return ctrl.Result{RequeueAfter: RequeueDelay}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started reading the finalizer docs https://book.kubebuilder.io/reference/using-finalizers
I wonder if we shoulden't use return ctrl.Result{}, client.IgnoreNotFound(err)
instead.
I took the freedom to change a few labels and run some make commands to be more consistent with the current setup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @theSuess @NissesSenap ,
Wow, just checked out your PR for alerting in Grafana Operator – absolutely killer work! 🚀 You guys really knocked it out of the park. Integrating alert management like this is a game-changer for us Kubernetes folks. Mad props for the effort and genius you poured into this. Can't wait to see what you hack together next!
Cheers,
Currently testing this using the provided example. I saw that the alerting rule group got applied:
however, in Grafana I don't see it: Am I missing something? |
this alleviates some race conditions when setting finalizers & status concurrently
this prevents invisible orphaned alert rules (and helps debugging)
Matching multiple folders does not make sense as alert rules can only be stored in one folder
this reduces load on the operator and eases deletion when no instance can be found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great job
I've tried to implement a safeguard for alerting on older Grafana versions but was blocked by not being able to infer the version of the instance. Tracking this in #1451 |
This PR adds initial alerting support as described in the Alerting Support Proposal.
To keep changes small, this PR only adds the
AlertRuleGroup
custom resource.This is also the first resource implemented using the new auto generated client (relevant for #1357)