Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator is constantly pushing UPDATE calls without making any changes? #3002

Open
diranged opened this issue May 31, 2024 · 5 comments
Open
Labels
area:collector Issues for deploying collector bug Something isn't working

Comments

@diranged
Copy link

Component(s)

collector, target allocator

What happened?

Description

We noticed that the OTEL Operator seems to be constantly re-updating the resources it manages - sending UPDATE calls every 15-30s, but not making any changes. Here's an example of the UPDATE calls graphed out via the EKS Audit Logs:

image

If you narrow the scope down and look at any single resource it's updating, you'll notice that the resourceVersion is not changing - there are no actual changes to the resource being made it seems:

image

This is mostly just annoying ... however, it actually has an impact because each UPDATE call triggers a KyvernoPolicy of ours, which then causes an AdmissionReport to be created, which then has to be consumed... basically it's just a bunch churn for no reason I can identify.

Kubernetes Version

1.28.0

Operator version

0.101.0

Collector version

0.101.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

Log output

No response

Additional context

No response

@diranged diranged added bug Something isn't working needs triage labels May 31, 2024
@jaronoff97
Copy link
Contributor

It looks like its specifically for the monitoring service, are there any operator logs you can share? CC @iblancasa who was working on this recently.

@jaronoff97 jaronoff97 added area:collector Issues for deploying collector and removed needs triage labels Jun 3, 2024
@diranged
Copy link
Author

diranged commented Jun 3, 2024

It looks like its specifically for the monitoring service, are there any operator logs you can share? CC @iblancasa who was working on this recently.

I put the operator into debug mode - and there were zero relevant logs.. just lots of spam about the PDB not being created, nothing else useful.

@LaikaN57
Copy link

LaikaN57 commented Jun 4, 2024

It looks like its specifically for the monitoring service [...]

@jaronoff97 This is actually happening on all otel services. OP was posting an example of one of the UIDs affected. Sorry for the confusion here.

Query:

fields @timestamp as ts, objectRef.name as name, objectRef.resourceVersion as ver, responseObject.metadata.resourceVersion as reqVer, requestObject.metadata.resourceVersion as rspVer, @message as msg
| filter @logStream like "kube-apiserver-audit"
| filter verb = "update"
| filter objectRef.resource = "services"
| filter user.username = "system:serviceaccount:otel:operator"
| sort @timestamp desc

Results:

image

@iblancasa
Copy link
Contributor

It looks like its specifically for the monitoring service, are there any operator logs you can share? CC @iblancasa who was working on this recently.

I have been reviewing and I don't see what of the recent changes I did can produce this effect. I will be happy to help in any way. I'll try to reproduce it.

@diranged
Copy link
Author

diranged commented Jun 4, 2024

It looks like its specifically for the monitoring service, are there any operator logs you can share? CC @iblancasa who was working on this recently.

I have been reviewing and I don't see what of the recent changes I did can produce this effect. I will be happy to help in any way. I'll try to reproduce it.

Thank you - and if there's any need for us to run a test-build of the operator that enables any kind of debug logs, we're happy to do that..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:collector Issues for deploying collector bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants