Add Opentelemetry tracing #2178

richardcase · 2020-12-29T20:39:38Z

/kind feature
/help

Describe the solution you'd like
I'd like to be able to see tracing of the various controllers using opentelemetry. This will enable us to see how many and how long reconcile loops, AWS API calls etc take which will be very helpful for issue investigation.

Environment:

Cluster-api-provider-aws version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2020-12-29T20:39:39Z

@richardcase:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/kind feature
/help

Describe the solution you'd like
I'd like to be able to see tracing of the various controllers using opentelemetry. This will enable us to see how many and how long reconcile loops, AWS API calls etc take which will be very helpful for issue investigation.

Environment:

Cluster-api-provider-aws version:

Kubernetes version: (use kubectl version):

OS (e.g. from /etc/os-release):

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

randomvariable · 2021-01-05T12:10:13Z

We should probably help take this to completion kubernetes-sigs/controller-runtime#1211

richardcase · 2021-01-05T12:15:26Z

I agree @randomvariable .

We could also start to add OpenTelemetry spans like they have done in CAPZ and then when kubernetes-sigs/controller-runtime#1211 is agreed and implemeted we'd need to make sure that we use the existing parent spans from the context when we create our spans.

fejta-bot · 2021-06-09T19:49:44Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

richardcase · 2021-06-09T20:18:18Z

/lifecycle frozen

randomvariable · 2021-11-08T19:30:13Z

/priority backlog

bisakhmondal · 2022-03-05T12:56:40Z

Hi guys, I am Bisakh, wish to improve/add the observability of this module through tracing and metrics-based monitoring with Prometheus as part of GSoC 2022. I am new to this project and want to get myself involved into the core infrastructure. Could you please guide me here by passing on some of your workloads so that I can learn by doing different things?
Eagerly waiting for your reply. Thanks : )

sedefsavas · 2022-03-08T04:38:48Z

Hi @bisakhmondal, welcome to the community.

You can start with some issues with good-first-issue label to get a grip on the project.
Here is the getting started doc: https://cluster-api-aws.sigs.k8s.io/getting-started.html
Feel free to reach us at #cluster-api-aws slack channel for questions.

We haven't started working on the design yet.
Applications for GSoC will start in April AFAIK, and it is not certain that our project is accepted yet.

bisakhmondal · 2022-03-11T18:01:02Z

Hi @sedefsavas, thanks for the detailed reply. I'll start working on the issues then.

We haven't started working on the design yet.

No issues, I can also help there. Feel free to include me in your design meeting.

Applications for GSoC will start in April AFAIK, and it is not certain that our project is accepted yet.

Yes, the first week of April. I have a huge interest in cloud-native stuffs and recently I am working with monitoring tools and solutions so I want to explore more. Just curious, what do you mean by the selection here? I see the project is listed on cncf page (cncf/mentoring repo ig).

xylonx · 2022-03-31T07:06:10Z

Hi, I am xylonx wanting to sovle the issue by introducing some observation tool like prometheus or jaeger and collecting the metrics info by opentelemetry. Actually, I want to treat it as my GSoC 2022 project. I am new in this community and hope it can encourage my cloud native knowledge and skill. I have used opentelemetry and prometheus / jaeger before, therefore I think I can solve it. Could you give me some practical advice ? Hoping for your reply :)

richardcase · 2022-04-06T17:39:11Z

@rvacaru @xylonx @bisakhmondal - looks like the deadline for submitting a proposal for GSoC is 19th April. Would it be worth us having a call to go through this issue?

richardcase · 2022-04-07T17:35:16Z

It would be worth having a look at the demo Bryan did in the CAPI office hours on 16th Sept 2021: https://youtu.be/aThFgrYthOc

richardcase · 2022-04-07T17:37:43Z

There was an issue on CAPI as well to add this but it was closed as it went stale: kubernetes-sigs/cluster-api#3760.

Adttionally CAPZ added otel support see:

Add tracing and metrics - Open Telemetry cluster-api-provider-azure#311
Export traces in OpenTelemetry format and then to Azure Monitor cluster-api-provider-azure#1113

HeyAdityaPatidar · 2022-04-19T17:05:03Z

Hey! @sedefsavas and @richardcase, I'm Aditya, I want to work on the project for GSoC. Is it already assigned to anyone else?
Can you please acknowledge me on this project?
Can you help me with this problem?

richardcase · 2022-04-21T08:13:33Z

Hi @HeyAdityaPatidar. I believe the deadline to submit a proposal for GSoC has now passed. You could try logging on to the portal to see if it still lets your submit a proposal?

richardcase · 2022-07-12T15:30:15Z

/remove-lifecycle frozen

k8s-triage-robot · 2022-10-23T19:25:36Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Aut0R3V · 2022-11-13T21:06:57Z

@richardcase I would like to work on this issue. I went through the discussion that you guys had. Could you tell me what needs to be done for this? I would like to assign myself for this issue.

richardcase · 2022-11-14T13:10:50Z

/assign Aut0R3V

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Dec 29, 2020

randomvariable added this to the v0.7.0 milestone Jan 5, 2021

randomvariable modified the milestones: v0.7.0, v0.7.x Mar 11, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2021

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 9, 2021

randomvariable modified the milestones: v0.7.x, Backlog Nov 8, 2021

k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Nov 8, 2021

k8s-ci-robot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jul 12, 2022

richardcase removed this from the Backlog milestone Jul 25, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 23, 2022

k8s-ci-robot assigned Aut0R3V Nov 14, 2022

Ankitasw mentioned this issue Dec 5, 2022

metrics port is not exposed properly in default configuration #3856

Closed

Fedosin mentioned this issue Aug 20, 2023

Enable OpenTelemetry support #4454

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Opentelemetry tracing #2178

Add Opentelemetry tracing #2178

richardcase commented Dec 29, 2020

k8s-ci-robot commented Dec 29, 2020

randomvariable commented Jan 5, 2021

richardcase commented Jan 5, 2021

fejta-bot commented Jun 9, 2021

richardcase commented Jun 9, 2021

randomvariable commented Nov 8, 2021

bisakhmondal commented Mar 5, 2022

sedefsavas commented Mar 8, 2022

bisakhmondal commented Mar 11, 2022

xylonx commented Mar 31, 2022

richardcase commented Apr 6, 2022

richardcase commented Apr 7, 2022

richardcase commented Apr 7, 2022

HeyAdityaPatidar commented Apr 19, 2022

richardcase commented Apr 21, 2022

richardcase commented Jul 12, 2022

k8s-triage-robot commented Oct 23, 2022

Aut0R3V commented Nov 13, 2022 •

edited

Loading

richardcase commented Nov 14, 2022

Add Opentelemetry tracing #2178

Add Opentelemetry tracing #2178

Comments

richardcase commented Dec 29, 2020

k8s-ci-robot commented Dec 29, 2020

randomvariable commented Jan 5, 2021

richardcase commented Jan 5, 2021

fejta-bot commented Jun 9, 2021

richardcase commented Jun 9, 2021

randomvariable commented Nov 8, 2021

bisakhmondal commented Mar 5, 2022

sedefsavas commented Mar 8, 2022

bisakhmondal commented Mar 11, 2022

xylonx commented Mar 31, 2022

richardcase commented Apr 6, 2022

richardcase commented Apr 7, 2022

richardcase commented Apr 7, 2022

HeyAdityaPatidar commented Apr 19, 2022

richardcase commented Apr 21, 2022

richardcase commented Jul 12, 2022

k8s-triage-robot commented Oct 23, 2022

Aut0R3V commented Nov 13, 2022 • edited Loading

richardcase commented Nov 14, 2022

Aut0R3V commented Nov 13, 2022 •

edited

Loading