Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silenced entries can fail to be deleted if there are many for an entity #4384

Closed
echlebek opened this issue Aug 12, 2021 · 4 comments · Fixed by #4426
Closed

Silenced entries can fail to be deleted if there are many for an entity #4384

echlebek opened this issue Aug 12, 2021 · 4 comments · Fixed by #4426
Labels
bug component:api Sensu API improvements
Milestone

Comments

@echlebek
Copy link
Contributor

echlebek commented Aug 12, 2021

Expected Behavior

Expired silenced entries will be deleted, no matter how many there are.

Current Behavior

{"component":"store","error":"internal error: etcdserver: too many operations in txn request","level":"error","msg":"error deleting expired silenced entries","time":"2021-08-10T20:57:55Z"}

It seems that when the number of silenced entries for an entity pass a certain threshold, trying to delete all of them in a transaction produces an error.

Possible Solution

Batch the operations in smaller amounts.

@echlebek echlebek added the bug label Aug 12, 2021
@portertech
Copy link
Contributor

Seems max operations per transaction is configurable, see etcd-io/etcd#10048

@portertech
Copy link
Contributor

We probably want to increase the default value, when running with embedded etcd.

@tarcinil
Copy link
Contributor

We've noticed this error too.

However, ours is different in that it can sometime cause the actual sensu-backend service to fail starting. We believe that this was caused by someone accidentally silencing everything. We noted that we had over 800 keys in the /sensu.io/silenced keyspace.

You have to carefully increase the etcd --max-txn-ops flag, because if it is too high, you can start reaching context deadline exceeded territory due to transactions taking too long, I noticed this when initializing keepalives for namespace/node.

Caveats:

  • We are running an 3 node external etcd 3.5.0 (chef managed).
  • We are running a community build at 6.4.0 (chef managed).

@echlebek
Copy link
Contributor Author

@tarcinil yeah this is an unfortunate side effect of trying to batch silenced operations into transactions. Sensu needs to be patched to use multiple batches instead of trying to do everything in a single txn.

@calebhailey calebhailey added this to the 6.6.0 milestone Aug 26, 2021
@calebhailey calebhailey added the component:api Sensu API improvements label Aug 26, 2021
@calebhailey calebhailey modified the milestones: 6.next, 6.6.0 Sep 23, 2021
@calebhailey calebhailey modified the milestones: 6.6.0, 6.5.0 Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component:api Sensu API improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants