Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor events and introduce v1beta3 API for Alert and Provider #540

Merged
merged 9 commits into from
Nov 28, 2023

Conversation

darkowlzz
Copy link
Contributor

@darkowlzz darkowlzz commented Jun 1, 2023

Refer #626 for more details.

Based on a lot of observations from v1beta2 API discussion in #435, it became apparent that Alert and Provider can be static data objects without reconcilers and status subresource.

These changes are based on the experiments from last November, refer https://github.com/fluxcd/notification-controller/commits/refactor-event-handler, updated to include all the latest changes.

  • Add Alert and Provider v1beta3 APIs which make them static API objects removing the need for any reconciler.
  • Remove the old Alert and Provider reconcilers.
  • Refactor event handler, splitting it into smaller functions with unit test and integration test of the event server. Also, the event handlers emit k8s events on the respective Alert object for the events.
  • Add new Alert and Provider reconcilers to help in migration to static API objects. This is needed to remove the finalizer set by the previous reconcilers. Stale finalizers would interfere with object deletion. Since the new API don't have status, the status from the old objects are automatically dropped when the new API is used. These are temporary migration reconcilers and will be removed in a subsequent release. (NOTE: These new reconcilers are added separately, not on top of the previous reconcilers. Reading commit-by-commit would make it easier to understand the changes.)
  • Add new v1beta3 spec docs.

Copy link
Collaborator

@matheuscscp matheuscscp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass LGTM

config/rbac/role.yaml Outdated Show resolved Hide resolved
internal/server/event_server_test.go Show resolved Hide resolved
@darkowlzz
Copy link
Contributor Author

These changes depended on the custom metrics that was release in Flux v2.1.0 for the static Alert and Provider metrics. It's no longer blocked now.

I've rebased and tried my best to go through all the recent changes to make sure I haven't missed anything that was added as these changes completely change the alert and provider related code. But still, it'd be great if others look out for things I may have missed. I'll also continue looking for anything missing. As it has been a few months since these changes were written, I'll be testing more and reevaluating if everything is working as they are supposed to be. Marking the PR ready for review.

❔ ❓ I need some help in deciding what to do with the deprecated caFile field support for TLS Secret data in Provider. It was deprecated in v1.1.0 (Flux v2.1.0) in v1beta2 API. This PR introduces v1beta3 API and we can drop the support for it. In the current code, I've kept the support with all the tests. If we decide to drop the support with v1beta3 API, I can remove all the related code.

@darkowlzz darkowlzz marked this pull request as ready for review September 14, 2023 21:02
@darkowlzz darkowlzz added area/alerting Alerting related issues and PRs area/api API related issues and pull requests labels Sep 14, 2023
@stefanprodan
Copy link
Member

@darkowlzz I'm for dropping caFile support in v1beta3, please go ahead and delete the code and related docs if any.

@darkowlzz
Copy link
Contributor Author

darkowlzz commented Oct 25, 2023

I need some help in deciding what to do with the deprecated caFile field support for TLS Secret data in Provider. It was #597 in v1.1.0 (Flux v2.1.0) in v1beta2 API. This PR introduces v1beta3 API and we can drop the support for it. In the current code, I've kept the support with all the tests. If we decide to drop the support with v1beta3 API, I can remove all the related code.

We had a discussion about this with all the maintainers and it was decided to keep this support for now and drop it in v1 API of the Provider. I'll add a TODO in the code to remove it in v1.

@darkowlzz darkowlzz force-pushed the refactor-events branch 2 times, most recently from a2d4a39 to 9037fdd Compare October 25, 2023 15:35
Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mark the v1beta2 APIs as deprecated.

From docs:

A beta version API becomes deprecated once a subsequent beta or stable API version is released. A deprecated beta version is subject to removal after a six-months period.

@darkowlzz
Copy link
Contributor Author

Please mark the v1beta2 APIs as deprecated.

Marked both v1beta1 and v1beta2 Alert and Provider APIs as deprecated.

Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @darkowlzz 🥇

PS. Please create a PR in website to move the API docs to v1beta3 and update all docs to the new version.

@stefanprodan
Copy link
Member

@darkowlzz please integrate #639 and make the correction in docs to

[Bitbucket Server/Data Center](#bitbucket-serverdata-center)

@darkowlzz
Copy link
Contributor Author

darkowlzz commented Nov 21, 2023

Pushed a fix for log in the new event handler. A post request failure was logged without any context before:

{
  "level": "error",
  "ts": "2023-11-21T16:50:24.582Z",
  "msg": "failed to send notification",
  "error": "postMessage failed: request failed with status code 403, invalid_token"
}

This was because of using the wrong context in the logger. Using the proper context fixes it with rich contextual log:

{
  "level": "error",
  "ts": "2023-11-21T16:58:07.483Z",
  "logger": "event-server",
  "msg": "failed to send notification",
  "eventInvolvedObject": {
    "kind": "HelmRepository",
    "namespace": "default",
    "name": "podinfo",
    "uid": "4892228c-eed5-4dc7-8969-94e51d0df5e7",
    "apiVersion": "source.toolkit.fluxcd.io/v1beta2",
    "resourceVersion": "991119"
  },
  "Alert": {
    "name": "default-alerts",
    "namespace": "default"
  },
  "error": "postMessage failed: request failed with status code 403, invalid_token"
}

@darkowlzz darkowlzz requested a review from a team November 21, 2023 17:23
@darkowlzz
Copy link
Contributor Author

darkowlzz commented Nov 21, 2023

please integrate #639 and make the correction in docs to

Rebased and updated the v1beta3 docs with Bitbucket server changes and correction.

@darkowlzz darkowlzz force-pushed the refactor-events branch 2 times, most recently from dd47b7a to 7c1b417 Compare November 22, 2023 12:33
v1beta3 API for Alert and Provider makes them static objects, removing
the status subresource and spec fields that are relevant to dynamic
objects with reconcilers.

Signed-off-by: Sunny <darkowlzz@protonmail.com>
In v1beta3 API, Alert and Provider are static objects and don't need
reconcilers.

Signed-off-by: Sunny <darkowlzz@protonmail.com>
- Break down the EventServer.handleEvent() implementation into multiple
  smaller functions which are extensively tested on their own.
  - New implementation of filter Alerts for Event
  - New implementation of Event matches Alert
- Remove any readiness check on Alert or Provider.
- Add kubebuilder marker for generating RBAC permissions to create and
  patch events, and query Alert and Provider objects.
- Convert the event handler test from controllers/ dir to work with
  just EventServer without any reconciler, keeping all the test cases
  and slightly modified test set up code.

Signed-off-by: Sunny <darkowlzz@protonmail.com>
Emit events in the event handler along with logs on the respective alert
to make the message visible on the alert it belongs to.

Signed-off-by: Sunny <darkowlzz@protonmail.com>
Add new Alert and Provider reconcilers to perform migration to static
objects. The new Alert and Provider APIs don't contain any status. When
the existing Alerts and Providers are queries using the new API client,
the status would be dropped. A subsequent write of the object to update
the object in api-server will migrate the objects to the new version and
drop the status.
For the stale finalizers on the objects, the new reconcilers ensure that
the finalizers get removed.

Signed-off-by: Sunny <darkowlzz@protonmail.com>
The new alert and provider don't have any status. Remove any status
related checks from workflows/e2e tests.

Signed-off-by: Sunny <darkowlzz@protonmail.com>
Signed-off-by: Sunny <darkowlzz@protonmail.com>
Signed-off-by: Sunny <darkowlzz@protonmail.com>
Use the context containing proper information about the event for
logging. Previously, the logged error didn't contain any information
about the event, alert or the involved object.

Signed-off-by: Sunny <darkowlzz@protonmail.com>
@darkowlzz darkowlzz merged commit 78f5509 into main Nov 28, 2023
6 checks passed
@darkowlzz darkowlzz deleted the refactor-events branch November 28, 2023 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/alerting Alerting related issues and PRs area/api API related issues and pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants