Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native Kubernetes secrets for authorization #130

Closed

Conversation

TheSpiritXIII
Copy link
Member

@TheSpiritXIII TheSpiritXIII commented Jan 12, 2024

This PR adds support for fetching native Kubernetes secrets to Prometheus for authorization.

The plan is that this PR would eventually be superseded by upstream efforts (see prometheus/alertmanager#3108 (comment)).

This is a slimmed down version of what could eventually become an upstream change. While some of the code uses a generic SecretProvider interface, the new configs are harded-coded Kubernetes secrets (which maps a secret reference name to a Kubernetes secret) and a new kubernetes_sp_config (which configures the connection to the Kubernetes API server). Then, that secret name is referenced in the authorization config, e.g. for BasicAuth it would be password_ref. These would eventually be replaced with the upstream version when they get released.

One way this implementation differs from service discovery is that configuration changes are incremental: when a secret is added or removed, the Kubernetes secret provider simply adds or removes a Kubernetes WATCH API call, while existing watches are kept in-tact.

@pintohutch
Copy link
Collaborator

Thanks for the awesome work @TheSpiritXIII!

@bwplotka @macxamin @bernot-dev - I'd love for any or all of us to begin reviewing this change to internalize the approach and ask questions. PTAL if you have capacity

@TheSpiritXIII TheSpiritXIII force-pushed the TheSpiritXIII/secret-management branch 2 times, most recently from 82b9ea0 to db9ca97 Compare January 22, 2024 18:06
@TheSpiritXIII TheSpiritXIII force-pushed the TheSpiritXIII/release-2.43.1-gmp branch 4 times, most recently from 12397b3 to 78f252c Compare January 24, 2024 14:42
Base automatically changed from TheSpiritXIII/release-2.43.1-gmp to release-2.43.1-gmp January 24, 2024 15:04
@bwplotka
Copy link
Collaborator

Ideally this only adds the change we need for native secrets, otherwise it's impossible to review and merge 🤔

@TheSpiritXIII TheSpiritXIII marked this pull request as ready for review January 25, 2024 23:48
@TheSpiritXIII
Copy link
Member Author

Ideally this only adds the change we need for native secrets, otherwise it's impossible to review and merge 🤔

Does this PR still feel too big? I admit it has some generic ServiceProvider interfaces. The intention is we could change the implementation if we have performance concerns. For example we could add a poll-based watcher if we feel the watches overload the Kubernetes API server, or even a hybrid approach where watches are used until a threshold and then it switches over to polling. Perhaps it can still be trimmed a little more though.

Copy link
Collaborator

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer bigger ones for self-contained change.

Nice, LGTM for our MVP.

Love the code, good tests.

The only things we have to double check for proper upstream feature are:

  • Double checking if we shouldn't revise configuration layer (avoiding references)
  • Integrating with plugin system (I don't know how that works yet, other than that it exists for SD).

config/config.go Show resolved Hide resolved
secrets/manager.go Outdated Show resolved Hide resolved
@TheSpiritXIII TheSpiritXIII force-pushed the TheSpiritXIII/secret-management branch 2 times, most recently from 3ae35b6 to d475e9d Compare February 5, 2024 21:31
@TheSpiritXIII
Copy link
Member Author

@bwplotka requesting re-review in case you want to look at instrumentation changes and the new feature flag!

Copy link
Collaborator

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks!

I left some comments, mainly for proper upstream version, so I'm happy to merge this one in this version for now.

cmd/prometheus/main.go Show resolved Hide resolved
@@ -199,6 +203,9 @@ func (c *flagConfig) setFeatureListOptions(logger log.Logger) error {
c.tsdb.EnableNativeHistograms = true
c.scrape.EnableProtobufNegotiation = true
level.Info(logger).Log("msg", "Experimental native histogram support enabled.")
case "kubernetes-secret-provider":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case "kubernetes-secret-provider":
case "secret-provider":

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally different from what we want in upstream Prometheus. See earlier comment.

config/config.go Show resolved Hide resolved
secrets/kubernetes/config.go Outdated Show resolved Hide resolved
secrets/manager.go Show resolved Hide resolved
secrets/kubernetes/watch.go Show resolved Hide resolved
secrets/secrets.go Outdated Show resolved Hide resolved
secrets/secrets.go Show resolved Hide resolved
secrets/manager.go Show resolved Hide resolved
@TheSpiritXIII TheSpiritXIII force-pushed the TheSpiritXIII/secret-management branch 4 times, most recently from a3496fe to f8ca684 Compare February 13, 2024 16:09
Copy link
Collaborator

@pintohutch pintohutch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. Still digesting the code additions and the config surface.

Overall, I'm not blocking with my comments here, as this is going to be a black box for users who will never use the configuration options themselves. If the change works, we can ship it and work on refining/simplifying the code later.

secrets/secrets.go Show resolved Hide resolved
cmd/prometheus/main.go Show resolved Hide resolved
secrets/manager.go Show resolved Hide resolved
secrets/manager.go Show resolved Hide resolved
secrets/kubernetes/config.go Show resolved Hide resolved
secrets/kubernetes/watch.go Show resolved Hide resolved
secrets/manager.go Show resolved Hide resolved
secrets/kubernetes/watch.go Show resolved Hide resolved
Copy link
Collaborator

@pintohutch pintohutch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still reading through to give the rest of the review - but PTAL at my current comments

Copy link
Collaborator

@pintohutch pintohutch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - first pass done.PTAL

secrets/secrets.go Show resolved Hide resolved
// Provider is a secret provider.
type Provider[T any] interface {
// Add returns the secret fetcher for the given configuration.
Add(ctx context.Context, config *T) (Secret, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about mimicing the Discoverer API, which is much smaller (and presumably easier for adopters to implement)?

I think with that, we could also borrow a lot of the same code to implement secret-updates using channels, which has some nice advantages and learnings from upstream (example)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as ease goes, one could argue that this PR's way is easier. You could fallback to an "easy" implementation by just not implementing all methods. For example, a simple Kubernetes implementation might implement Add and return a Secret which just wraps around the API Server GET. No need to implement Update or Remove because there's no state.

The lack of dynamic updating of service discovery is a known restriction in Prometheus. If we adopted the same methods as the existing service discovery, we'd need to reopen all watches each time the configuration changes. We would need to add jitter to prevent being too bursty, as well as the hack you pointed out.

The biggest reason for the channels in service discovery is because custom service discovery is just a wrapper over the service discovery interface to output into a scrape config file with static HTTP endpoints. Due to this, configuration updates occur frequently. As such, I argue against implementing channels for batching configuration updates because also they are minimal for us, since I do diff'ing of each secret configuration to see if it changed and call Update accordingly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following, as channels are already the catalyst for configuration updates when leveraging reloader.

In general, I don't see how sending a channel receiver as an argument in an API prevents implementations from using it efficiently.

I also think having "optional" parts of an API could be misleading for implementers.

Either way, none of this is blocking. We can stick with this implementation in our fork as we're the only implementers.

@TheSpiritXIII
Copy link
Member Author

Moving this change to prometheus-engine here: GoogleCloudPlatform/prometheus-engine#864

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants