-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding aws-s3 metric for sqs worker utilization #34793
Conversation
This pull request is now in conflicts. Could you fix it? 🙏
|
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
This pull request is now in conflicts. Could you fix it? 🙏
|
Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
…fying calculation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor comments, but this structure looks good.
I do think there could be some inaccuracies with the metric value for long running SQS workers. For example if all five workers take 10 seconds to complete then in the first period the utilization will be 0 and then next it would be 2 (which is out of range).
Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
… s3-metric-utilization
@andrewkroh I added some more logic into the calculation function to resolve that situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would need to track the start times of each SQS worker to know if it was running for the whole 5 second interval or a fraction of it. I experimented with a change that uses a beginSQSWorker()
and endSQSWorker()
method to track each worker. I'm going to push that up and see what you think.
maxUtilization := float64(d) * float64(maxMessagesInflight) | ||
utilizedRate := float64(atomic.SwapInt64(&m.utilizationNanos, 0)) / maxUtilization | ||
|
||
if utilizedRate == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we have two workers operating? One is long running (taking more than 5 seconds) and one is operating on short jobs (like completing every 100ms).
Track the running time of each SQS worker in order to accurately compute the utilization of the workers after each 5 second period.
/test |
I think those functions make sense. It seems much more thorough, while still being easy to follow |
What does this PR do?
Adding the metric to answer how many utilized the SQS workers are . 0 indicates free, 1 indicates all are working the entire polling duration of 5 seconds.
Checklist
I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
http://localhost:5066/dataset?pretty
now includes the followingTODO do we want to limit how many decimal places?
Related issues