Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port opencensus integration to opentelemetry #3126

Open
slinkydeveloper opened this issue May 11, 2020 · 24 comments
Open

Port opencensus integration to opentelemetry #3126

slinkydeveloper opened this issue May 11, 2020 · 24 comments
Assignees
Labels
area/observability kind/feature-request priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Issues which should be fixed (post-triage)
Milestone

Comments

@slinkydeveloper
Copy link
Contributor

Problem
Now we use opencensus, which is going to be deprecated, we need to switch to https://github.com/open-telemetry/opentelemetry-go

Persona:
Knative contributor

Exit Criteria
We don't depend anymore on opencensus

Time Estimate (optional):
🤷‍♂️

Additional context (optional)
We also need to handle this: knative/eventing-contrib#1155 (comment)

@lionelvillard
Copy link
Member

Related to knative/pkg#855

@slinkydeveloper
Copy link
Contributor Author

Oh thanks for spotting it!

@grantr
Copy link
Contributor

grantr commented May 11, 2020

Looks like OpenTelemetry Go SDK is in beta now: https://github.com/open-telemetry/opentelemetry-go. So the next step is making sure that OT meets all our requirements.

@grantr grantr added area/observability priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels May 11, 2020
@slinkydeveloper
Copy link
Contributor Author

@grantr do we have some document with these requirements defined? How would you proceed?

@vaikas
Copy link
Contributor

vaikas commented May 18, 2020

@evankanderson @anniefu would you mind sharing what you had in mind? I know you had done a lot of work here.

@anniefu
Copy link

anniefu commented May 18, 2020

The upcoming metrics plan that Evan wrote up https://github.com/knative/pkg/blob/master/metrics/README.md is primarily about moving to a centralized collector service model for metrics.

Since, the OpenCensus Collector Service agent and the OpenTelemetry Collector Service agent will both support the OpenCensus client libraries, we can port from opencensus client libraries to the opentelemetry client libraries independently of the new metrics plan.

The last time we talked to OpenTelemetry back in february, their metrics models were still in flux and they recommended @evankanderson make changes to opencensus library to support Knative rather than trying to early-migrate to Otel client libraries.

Given that, I think we want the metrics plan to land and maybe wait beyond Otel beta before considering porting Knative libraries.

@anniefu
Copy link

anniefu commented May 18, 2020

In terms of knative/eventing-contrib#1155, tracing and metrics in Knative are handled completely separately right now, so it is possible to migrate to OpenTelemetry for tracing, but not metrics.

I'm not very familiar with the world of tracing, so I'll leave that judgment call to others.

@slinkydeveloper
Copy link
Contributor Author

Links to cloudevents/sdk-go#554

@github-actions
Copy link

github-actions bot commented Dec 2, 2020

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2020
@slinkydeveloper
Copy link
Contributor Author

/remove-lifecycle stale

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2020
@skonto
Copy link
Contributor

skonto commented Jan 21, 2021

We discussed this a bit yesterday (meeting minutes here), I will take a look by porting the Api Server source to the new model and then we can discuss the next steps.

@devguyio
Copy link
Contributor

@skonto that sounds great, can you please create an issue for that ?

@skonto
Copy link
Contributor

skonto commented Jan 21, 2021

Sure @devguyio

@slinkydeveloper
Copy link
Contributor Author

@skonto is this issue still useful? Or is there an updated one and we can close this one?

@skonto
Copy link
Contributor

skonto commented Feb 12, 2021

@slinkydeveloper let's keep this is an umbrella ticket for the Eventing migration work. The other ticket is just a sub-task.

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2021
@ustiugov
Copy link

hi all, I wonder when we could expect eventing to support OpenTelemetry.

@skonto
Copy link
Contributor

skonto commented Jun 24, 2021

@ustiugov if you are referring to the OTEL client go lib afaik it is not ready from a metrics perspective. Tracing is in better shape upstream. Integrating with the OTEL collector is possible as described in the docs.

@ustiugov
Copy link

@ustiugov if you are referring to the OTEL client go lib afaik it is not ready from a metrics perspective. Tracing is in better shape upstream. Integrating with the OTEL collector is possible as described in the docs.

@skonto thank you. OpenTelemetry traces for Serving and our gRPC based functions work well.

With eventing, we are able to visualize Knative components but we do not know how to add the functions (also deployed with Serving), which generate ClouEvents and get triggered by them, to the same traces.

I wonder if there are any examples for that.

@skonto
Copy link
Contributor

skonto commented Jun 24, 2021

@ustiugov cloudevents client is observability aware so if you try setup tracing as in https://knative.dev/docs/eventing/accessing-traces/ you should see them for existing components. Same applies to user services that utilize the same api and setup tracing accordingly: https://github.com/cloudevents/sdk-go/blob/release-2.1/samples/http/receiver-traced/main.go. For a concrete trace handler using opencensus check:

pOpts = append(pOpts, cloudevents.WithRoundTripper(&ochttp.Transport{
.
Not sure if there are more samples maybe @slinkydeveloper has more pointers.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 29, 2021
@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 14, 2021
@ustiugov
Copy link

/remove-lifecycle stale

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 14, 2021
@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 13, 2022
@ustiugov
Copy link

/remove-lifecycle stale

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 13, 2022
@lionelvillard lionelvillard added the triage/accepted Issues which should be fixed (post-triage) label Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/observability kind/feature-request priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

No branches or pull requests

10 participants