Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding aws-s3 metric for sqs worker utilization #34793

Merged
merged 30 commits into from
Apr 3, 2023

Conversation

kgeller
Copy link
Contributor

@kgeller kgeller commented Mar 9, 2023

What does this PR do?

Adding the metric to answer how many utilized the SQS workers are . 0 indicates free, 1 indicates all are working the entire polling duration of 5 seconds.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

http://localhost:5066/dataset?pretty now includes the following

"sqs_worker_utilization": 0.061649393

TODO do we want to limit how many decimal places?

Related issues

@kgeller kgeller added enhancement Team:Security-External Integrations backport-skip Skip notification from the automated backport with mergify labels Mar 9, 2023
@kgeller kgeller self-assigned this Mar 9, 2023
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 9, 2023
@mergify
Copy link
Contributor

mergify bot commented Mar 9, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b s3-metric-utilization upstream/s3-metric-utilization
git merge upstream/main
git push upstream s3-metric-utilization

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 9, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-04-01T16:18:26.496+0000

  • Duration: 86 min 0 sec

Test stats 🧪

Test Results
Failed 0
Passed 2881
Skipped 179
Total 3060

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@kgeller kgeller marked this pull request as ready for review March 9, 2023 21:32
@kgeller kgeller requested a review from a team as a code owner March 9, 2023 21:32
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@kgeller kgeller requested review from andrewkroh and a team March 9, 2023 21:33
x-pack/filebeat/input/awss3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/awss3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/awss3/input.go Outdated Show resolved Hide resolved
@kgeller kgeller marked this pull request as draft March 9, 2023 23:35
@kgeller kgeller marked this pull request as ready for review March 10, 2023 21:42
@kgeller kgeller requested a review from efd6 March 10, 2023 21:43
@mergify
Copy link
Contributor

mergify bot commented Mar 16, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b s3-metric-utilization upstream/s3-metric-utilization
git merge upstream/main
git push upstream s3-metric-utilization

Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
@kgeller kgeller requested a review from andrewkroh March 27, 2023 18:42
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments, but this structure looks good.

I do think there could be some inaccuracies with the metric value for long running SQS workers. For example if all five workers take 10 seconds to complete then in the first period the utilization will be 0 and then next it would be 2 (which is out of range).

x-pack/filebeat/input/awss3/metrics_test.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/awss3/metrics_test.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/awss3/metrics.go Outdated Show resolved Hide resolved
@kgeller
Copy link
Contributor Author

kgeller commented Mar 28, 2023

I do think there could be some inaccuracies with the metric value for long running SQS workers. For example if all five workers take 10 seconds to complete then in the first period the utilization will be 0 and then next it would be 2 (which is out of range).

@andrewkroh I added some more logic into the calculation function to resolve that situation.

@kgeller kgeller requested a review from andrewkroh March 30, 2023 13:05
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would need to track the start times of each SQS worker to know if it was running for the whole 5 second interval or a fraction of it. I experimented with a change that uses a beginSQSWorker() and endSQSWorker() method to track each worker. I'm going to push that up and see what you think.

maxUtilization := float64(d) * float64(maxMessagesInflight)
utilizedRate := float64(atomic.SwapInt64(&m.utilizationNanos, 0)) / maxUtilization

if utilizedRate == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have two workers operating? One is long running (taking more than 5 seconds) and one is operating on short jobs (like completing every 100ms).

Track the running time of each SQS worker in order to
accurately compute the utilization of the workers after
each 5 second period.
@andrewkroh
Copy link
Member

/test

@kgeller
Copy link
Contributor Author

kgeller commented Mar 31, 2023

I think this would need to track the start times of each SQS worker to know if it was running for the whole 5 second interval or a fraction of it. I experimented with a change that uses a beginSQSWorker() and endSQSWorker() method to track each worker. I'm going to push that up and see what you think.

I think those functions make sense. It seems much more thorough, while still being easy to follow

@kgeller kgeller merged commit 91906c9 into elastic:main Apr 3, 2023
@kgeller kgeller deleted the s3-metric-utilization branch April 3, 2023 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants