Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add OpenCensus controller metrics #368

Closed
wants to merge 3 commits into from

Conversation

grantr
Copy link
Contributor

@grantr grantr commented Mar 21, 2019

Adds OpenCensus metrics to the controller with the same format as the existing Prometheus metrics. Prometheus metrics are still collected and available using the previous metrics server configuration, so existing users of Prometheus metrics should continue working.

No OpenCensus exporter is configured, but default views are provided. A followup PR may add a default exporter for simplicity, but the expectation is that user code generally takes care of creating exporters.

The client-go metrics are not yet hooked up to OpenCensus because only one metrics handler is allowed. A future PR (EDIT: to controller-runtime) will likely fix this.

Fixes #305.

The metrics have the same format as the Prometheus metrics.

Prometheus metrics are still collected and available using the
previous metrics server configuration, so existing users of Prometheus
metrics should continue working.

No OpenCensus exporter is configured, but default views are provided.

The client-go metrics are not yet hooked up to OpenCensus.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 21, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: grantr
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: directxman12

If they are not already assigned, you can assign the PR to them by writing /assign @directxman12 in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 21, 2019
Default views are now an array metrics.DefaultViews that can be
registered and unregistered by the user.

They now have the same names as the Prometheus metrics.
@brancz
Copy link

brancz commented Apr 1, 2019

I'm generally not entirely against this, but could these types of changes please be brought up in sig-instrumentation so that we can have a discussion about how to introduce/change something as deeply integrated to kubernetes as metrics are? (I realize that this is controller-runtime, not kubernetes/kubernetes, but there is already the talk of client-go)

@grantr
Copy link
Contributor Author

grantr commented Apr 4, 2019

@brancz happy to do that! I'm not familiar with the normal SIG process, so please bear with me. How should this be surfaced to the sig-instrumentation group? Would a GitHub issue suffice?

@brancz
Copy link

brancz commented Apr 5, 2019

A thread on the sig-instrumentation mailing list would be good, and/or start a discussion at the next meeting :) .

@grantr
Copy link
Contributor Author

grantr commented Apr 5, 2019

@DirectXMan12 @droot you probably have more context than me on how this PR relates to sig-instrumentation.

@DirectXMan12
Copy link
Contributor

not sure how much we want side-by-side here vs some sort of shim, will need to take a closer look.

@grantr
Copy link
Contributor Author

grantr commented Apr 5, 2019

I think a shim is the best long-term approach. I'd be happy to switch to that here. I wanted to demonstrate that OpenCensus could live alongside Prometheus client, but there may be no good reason to do that because OpenCensus can export identical metrics output.

@DirectXMan12
Copy link
Contributor

but there may be no good reason to do that because OpenCensus can export identical metrics output.

That's my thought. My only concern is breaking people using the global CR registry, but I suspect there aren't too many of those, and if we're going to break it, now's the time (put all the breaking changes together, before 1.0).

@grantr
Copy link
Contributor Author

grantr commented Apr 12, 2019

@brancz I created a thread on the sig-instrumentation mailing list pointing out this PR.

there is already the talk of client-go

I think you're referring to this statement in the PR body:

The client-go metrics are not yet hooked up to OpenCensus because only one metrics handler is allowed. A future PR will likely fix this.

This was inexact. By "future PR" I meant a future PR to controller-runtime that swaps metrics handlers from Prometheus client to OpenCensus. I wasn't proposing any changes to client-go. I'll clarify that.

@@ -173,10 +177,18 @@ func (c *Controller) Start(stop <-chan struct{}) error {
func (c *Controller) processNextWorkItem() bool {
// This code copy-pasted from the sample-Controller.

// Create a context used for tagging OpenCensus measurements.
metricsCtx, _ := tag.New(context.Background(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are tags present in the background context, then those values will continue to be propagated to the newly created context. As such, if we use tag.Insert in subsequent calls for existing tags, the values existing tag keys will not be updated.

Is that desired behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@logicalhan Are you saying the context returned by context.Background() may have tags present? I'm not aware of a way to add values to the background context, so I don't expect there will ever be tags present in it. As such I believe that in this code there is no effective difference between tag.Insert and tag.Upsert.

If I'm misunderstanding your question, please let me know. :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To maintain contextual information of the specific request, tag propagation through your distributed system is necessary; the higher levels generate tags that are then passed down to the lower-level services. Data is collected with the tags and are sent to your observability systems.

Above, a request comes in to the Web server. Web server tags all the outgoing requests with the following tags.
originator=photo-app
frontend=web
These values are propagated all the way down to database and the CDN.

With these tags, you can uniquely identify and break down which service called the downstream services, how much quota they’ve been used, what calls are failing and more.

In OpenCensus, tags can be propagated downstream and the values are stored in the context. Since this is the first time OpenCensus is being introduced, this is likely safe (since nothing else would be passing along OpenCensus tag information). However, that assumption may not hold in the future.

I'm actually not saying that tag.Insert is the incorrect thing to do here. I'm just pointing out that there is a difference in behavior between Insert and Update and depending on what we actually want to do here, one would be more appropriate than the other. Insert can no-op if we pass tag information to whatever triggers the controller. That may or may not be desirable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Upsert is probably safer, since it effectively forces the same behavior regardless of what happens upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that clarifies things!

In this hypothetical future when the metrics context is received from an external source that may have added tags of its own, I believe tag.Insert is correct here. If the tag is already specified by the user of the library, the library should respect the user's intent and avoid overwriting their metrics tags with tag.Upsert.

Does that seem reasonable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine you would need to Upsert the result tag. If a request has gotten here, then it would presumably have been successful upstream.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually all the result tags, you probably want to upsert.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since controllers can requeue things and pick them up again later, if something gets requeued, then wouldn't we expect the context have tags? It seems that we really do want to be judicious about using Insert or Upsert, since it will change existing outputted metric data.

It may be worth adding a testcase around this by forcing a requeue and then forcing an error when it gets picked up again and comparing the measurements with what we'd expect.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 10, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

DirectXMan12 pushed a commit that referenced this pull request Jan 31, 2020
…hema

drop type field from the CRD schema validation at the root level
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider using OpenTelemetry for metrics instead of Prometheus
6 participants