Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics to the catalog server #156

Merged

Conversation

everettraven
Copy link
Collaborator

@everettraven everettraven commented Aug 31, 2023

Description

  • Adds metrics to the catalog server for calculating the apdex score

Motivation

Note
This PR is based on #148 and should only be merged after. Due to this, this PR will remain as a draft until that #148 has merged

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 31, 2023
@codecov
Copy link

codecov bot commented Aug 31, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (966a4d6) 79.06% compared to head (484acfd) 79.06%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #156   +/-   ##
=======================================
  Coverage   79.06%   79.06%           
=======================================
  Files           3        3           
  Lines         215      215           
=======================================
  Hits          170      170           
  Misses         28       28           
  Partials       17       17           

☔ View full report in Codecov by Sentry.

📢 Have feedback on the report? Share it here.

that can be used for calculating the Apdex Score
and assess the health of the http server that is
serving catalog contents to clients

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
@everettraven everettraven changed the title WIP: Add metrics to the catalog server Add metrics to the catalog server Sep 8, 2023
@everettraven everettraven marked this pull request as ready for review September 8, 2023 16:49
@everettraven everettraven requested a review from a team as a code owner September 8, 2023 16:49
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 8, 2023
cmd/manager/main.go Outdated Show resolved Hide resolved
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
anik120
anik120 previously requested changes Sep 8, 2023
pkg/server/metrics.go Outdated Show resolved Hide resolved
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
// calculate Apdex Scores up to a T of 1 second, but using various mathmatical formulas we
// should be able to estimate Apdex Scores up to a T of 2.5. Having a larger range of buckets
// will allow us to more easily calculate health indicators other than the Apdex Score.
Buckets: []float64{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.2, 1.6, 2, 2.4, 2.8, 3.2, 3.6, 4, 10},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that our write timeout is 10s and the max bucket duration is 10s?

Seems like we'll only ever get whatever error code maps to that timeout in the 10s bucket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. If anything I think it means that we now have buckets that capture all possible response times and that allows us to calculate more metrics on the fly. This is all going based on #156 (comment) . Since if no requests take more than 10s we will never have anything in the "Inf" bucket.

That being said, I could be wrong - I don't have enough experience in this area to truly know and am making an assumption with what I currently know

Copy link
Collaborator Author

@everettraven everettraven Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we'll only ever get whatever error code maps to that timeout in the 10s bucket.

Any response time > 4s and <= 10s will fall in that 10s bucket

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. sgtm.

@joelanford joelanford dismissed anik120’s stale review September 8, 2023 17:17

The suggested change was accepted. Huzzah!

@everettraven everettraven added this pull request to the merge queue Sep 8, 2023
Merged via the queue into operator-framework:main with commit a1663ec Sep 8, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metrics to the Storage implementation
5 participants