Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kanister Prometheus Framework Design Doc #2151

Merged
merged 18 commits into from
Aug 8, 2023

Conversation

mellon-collie
Copy link
Contributor

@mellon-collie mellon-collie commented Jun 30, 2023

Change Overview

This PR adds the design proposal for the creating of a Prometheus skeleton framework in Kanister to enable us to export metrics to Prometheus.

Link to the new .md file:
https://github.com/kanisterio/kanister/blob/kanister-prometheus-skeleton-design-doc/design/kanister-prometheus-integration.md

Pull request type

Please check the type of change your PR introduces:

  • 🚧 Work in Progress
  • 🌈 Refactoring (no functional changes, no api changes)
  • 🐹 Trivial/Minor
  • 🐛 Bugfix
  • 🌻 Feature
  • 🗺️ Documentation
  • 🤖 Test

@github-actions
Copy link
Contributor

Thanks for submitting this pull request 🎉. The team will review it soon and get back to you.

If you haven't already, please take a moment to review our project contributing guideline and Code of Conduct document.

@infraq infraq added this to In Progress in Kanister Jun 30, 2023
@mellon-collie mellon-collie force-pushed the kanister-prometheus-skeleton-design-doc branch from c1a2702 to 05ba015 Compare June 30, 2023 23:04
@mellon-collie mellon-collie force-pushed the kanister-prometheus-skeleton-design-doc branch from dca876a to 423aeb9 Compare July 6, 2023 00:31
@mellon-collie
Copy link
Contributor Author

Review comments from @ewhamilton

1st Feedback:
When initializing metric vectors, it is important not only to have a fixed and bounded set of label names, it is also important to know up front all of the label values for each label name.

  1. A general caution about using too many labels or high cardinality / not-fixed values of label values is probably appropriate for Kanister.
  2. For CounterVecs this allows the InitCounterVec helper to publish and initialize the counters for all permutations of label values. This is done using the Add(0) function, ensuring that the data structure is set up and any scrape will see a 0.
  3. Initializing to 0 is not appropriate for Gauges, so not needed for GaugeVecs.
  4. While some form of publishing a 0 could be done for histograms (which often want to use the rate() function of the sum and count counters, it would likely be implementation dependent-- better to minimize use of labels with histograms and accept some imprecision in histograms across restarts.

At least for InitCounterVec then, caller should pass in something like a slice of BoundedLabel structs:
struct metrics.BoundedLabel {LabelName: string; LabelValues: []string}
1 for each label name, with all expected values listed.

Other Feedback:
-> With reference to metrics in Points 1,2,3 of text description in the design doc, use struct instead of an interface.
-> Rewrite the example to use operation_type and action_set_resolution as label names, rather than blueprint_name.
-> Explore using the Collect interface method to check if there's anything that helps us unit test the Init methods. If not, go ahead and manually test it for now.
-> Write 2 examples for creating label names to label value mappings - one for metrics where there are just 1-2 labels, where we use a direct prometheus call and one for 4-5 labels with the wrapper method.
-> Write basic unit tests using the https://github.com/prometheus/client_golang/blob/main/prometheus/testutil/testutil_test.go approach.

mellon-collie and others added 4 commits July 31, 2023 13:26
1. Created a BoundedLabel type that stores the label names and their bounded set of label values, and made relevant changes to InitCounterVec design to initialize counters to 0
2. Introduced "panic" interrupts if the error during registration is not AlreadyRegisteredError
1. With reference to  metrics in Points 1,2,3 of text description in the design doc, use struct instead of an interface.
2. Rewrite the example to use  operation_type and action_set_resolution as label names, rather than blueprint_name
3. Write 2 examples for creating label names to label value mappings - one for metrics where there are just 1-2 labels, where we use a direct prometheus call and one for 4-5 labels with the wrapper method.
1. Create a consistent interface for all methods - instead of passing just labels []string, pass BoundedLabels
2. Change the newMetrics method in controller example (design doc) to inject a new instance of the prometheus registry, instead of hardcoding the usage of a default directory.
@mellon-collie mellon-collie force-pushed the kanister-prometheus-skeleton-design-doc branch from b2ec569 to 784f5d6 Compare July 31, 2023 17:29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice diagram. Concise and helpful.

#### Text description
1. The initializer of the consumer package calls new_metrics, a helper method that talks to Kanister’s metrics package. The result is a new metrics struct that owns all the prometheus collectors.

2. In order to initialize all the required prometheus collectors, the new_metrics method calls the InitCounterVec, InitGaugeVec, InitHistogramVec in the metrics package. It passes the metric names and the specific label names and label values, aka, BoundedLabels, to the metrics package. Once it initializes all the prometheus collectors successfully, it returns an struct that wraps all the collectors, which the consumer package can then use.
Copy link
Contributor

@ewhamilton ewhamilton Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. In order to initialize all the required prometheus collectors, the new_metrics method calls the InitCounterVec, InitGaugeVec, InitHistogramVec in the metrics package. It passes the metric names and the specific label names and label values, aka, BoundedLabels, to the metrics package. Once it initializes all the prometheus collectors successfully, it returns an struct that wraps all the collectors, which the consumer package can then use.
2. In order to initialize all the required prometheus metrics, the new_metrics method calls the InitCounterVec, InitGaugeVec, InitHistogramVec in the metrics package. It passes the metric names and the specific label names and label values as BoundedLabels to the metrics package. Once it initializes all the prometheus metrics successfully, it returns a struct that wraps all the metrics and that the consumer package can then use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


ii. If no, return a nil CounterVec and the received error.

3. If received a nil CounterVec from regitration, interrupt with a panic, because an interrupt would suggest a failure in the created CounterVec, which should be fixed by the programmer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. If received a nil CounterVec from regitration, interrupt with a panic, because an interrupt would suggest a failure in the created CounterVec, which should be fixed by the programmer.
3. If received a nil CounterVec from registration, interrupt with a panic, because an interrupt would suggest a failure in the created CounterVec, which should be fixed by the programmer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

![Alt text](Metrics_design.png?raw=true "Prometheus Integration Design")

#### Text description
1. The initializer of the consumer package calls new_metrics, a helper method that talks to Kanister’s metrics package. The result is a new metrics struct that owns all the prometheus collectors.
Copy link
Contributor

@ewhamilton ewhamilton Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the perspective of the most important audience, a Kanister developer, I believe we should refer to these as Prometheus metrics, not collectors.
And for consistency in this doc, we should capitalize Prometheus.

I've suggested some of these, but please sweep the document for all occurrences.

A Collector is an internal interface and refers more to the side of the metrics being collected by the Prometheus code that is invoked when the /metrics endpoint is scraped. But let's keep it simple for the Kanister developer.
(Yes, it's also the type returned for all metrics and material for unit tests, etc.,
but for getting Kanister developers to add Prometheus metrics, let's not complicate it.)

See overview terminology used at https://prometheus.io/docs/introduction/overview/

Suggested change
1. The initializer of the consumer package calls new_metrics, a helper method that talks to Kanister’s metrics package. The result is a new metrics struct that owns all the prometheus collectors.
1. The initializer of the consumer package calls new_metrics, a helper method that talks to Kanister’s metrics package. The result is a new metrics struct that owns all the Prometheus metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exposing this as "Prometheus Metrics" makes sense. I've made the change throughout the document.


2. In order to initialize all the required prometheus collectors, the new_metrics method calls the InitCounterVec, InitGaugeVec, InitHistogramVec in the metrics package. It passes the metric names and the specific label names and label values, aka, BoundedLabels, to the metrics package. Once it initializes all the prometheus collectors successfully, it returns an struct that wraps all the collectors, which the consumer package can then use.

3. The metrics package internally attempts to initialize the collectors and register the specific collectors with prometheus. If the registration fails because the specific metric with label header already exists, the collector will simply be returned to the caller. If the registration fails due to other reasons, then the metrics package will cause a panic, signalling programmer error. In case of the CounterVec, the InitCounterVec attempts to generate all possible permutations of label values and sets each counter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. The metrics package internally attempts to initialize the collectors and register the specific collectors with prometheus. If the registration fails because the specific metric with label header already exists, the collector will simply be returned to the caller. If the registration fails due to other reasons, then the metrics package will cause a panic, signalling programmer error. In case of the CounterVec, the InitCounterVec attempts to generate all possible permutations of label values and sets each counter
3. The metrics package internally initializes the Prometheus metrics and registers them Prometheus. If the registration fails because the specific metric with label header already exists, the metric will simply be returned to the caller. If the registration fails due to other reasons, then the metrics package will cause a panic, signaling programmer error. In case of the CounterVec, the InitCounterVec function generates all possible permutations of label values and initializes each counter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

2. In order to initialize all the required prometheus collectors, the new_metrics method calls the InitCounterVec, InitGaugeVec, InitHistogramVec in the metrics package. It passes the metric names and the specific label names and label values, aka, BoundedLabels, to the metrics package. Once it initializes all the prometheus collectors successfully, it returns an struct that wraps all the collectors, which the consumer package can then use.

3. The metrics package internally attempts to initialize the collectors and register the specific collectors with prometheus. If the registration fails because the specific metric with label header already exists, the collector will simply be returned to the caller. If the registration fails due to other reasons, then the metrics package will cause a panic, signalling programmer error. In case of the CounterVec, the InitCounterVec attempts to generate all possible permutations of label values and sets each counter
within the CounterVec to 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
within the CounterVec to 0.
within the CounterVec with a value of 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

3. The metrics package internally attempts to initialize the collectors and register the specific collectors with prometheus. If the registration fails because the specific metric with label header already exists, the collector will simply be returned to the caller. If the registration fails due to other reasons, then the metrics package will cause a panic, signalling programmer error. In case of the CounterVec, the InitCounterVec attempts to generate all possible permutations of label values and sets each counter
within the CounterVec to 0.

4. Once the collector is created in the metrics package, it will be returned to the consumer package’s new_metrics helper method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. Once the collector is created in the metrics package, it will be returned to the consumer package’s new_metrics helper method.
4. Once the Prometheus metric is created in the metrics package, it will be returned to the consumer package’s new_metrics helper method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


4. Once the collector is created in the metrics package, it will be returned to the consumer package’s new_metrics helper method.

5. Once the initialization of all collectors are complete, a new metrics struct will be returned to the consumer’s package initializer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. Once the initialization of all collectors are complete, a new metrics struct will be returned to the consumer’s package initializer.
5. Once the initialization of all metrics are complete, a new metrics struct will be returned to the consumer’s package initializer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


5. Once the initialization of all collectors are complete, a new metrics struct will be returned to the consumer’s package initializer.

6. Suppose the consumer package wants to increment a specific counter in a counter vec, it constructs a prometheus.Labels mapping using a helper method to retrieve the specific counter from the counter vec and performs an increment operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
6. Suppose the consumer package wants to increment a specific counter in a counter vec, it constructs a prometheus.Labels mapping using a helper method to retrieve the specific counter from the counter vec and performs an increment operation.
6. The consumer package may find it usefule to implement a helper method that constructs
a prometheus.Labels mapping to access a specific counter from a counter vec and perform an increment operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/usefule/useful/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed




### Low Level APIs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got some questions about how to handle APIs in this document. I don't know what guidance you've been given or what the Kanister project has for documentation guidelines.

My 2 concerns relate to whether to include code fragments for APIs in design docs and if so to what degree.

For now I'm going to ignore that process issue and will make any substantive comments in the godoc headers for the code in #2163 .

design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Show resolved Hide resolved
2. In order to initialize all the required prometheus collectors, the new_metrics method calls the InitCounterVec, InitGaugeVec, InitHistogramVec in the metrics package. It passes the metric names and the specific label names and label values, aka, BoundedLabels, to the metrics package. Once it initializes all the prometheus collectors successfully, it returns an struct that wraps all the collectors, which the consumer package can then use.

3. The metrics package internally attempts to initialize the collectors and register the specific collectors with prometheus. If the registration fails because the specific metric with label header already exists, the collector will simply be returned to the caller. If the registration fails due to other reasons, then the metrics package will cause a panic, signalling programmer error. In case of the CounterVec, the InitCounterVec attempts to generate all possible permutations of label values and sets each counter
within the CounterVec to 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this small section hard-wrapped? Ideally, all lines within the Markdown file would be hard-wrapped to allow better Git diffs in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. I've hard-wrapped all the lines to 80 characters for better readability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miquella Don't we have some guideance somewhere about "semantic linefeeds" or such?

https://rhodesmill.org/brandon/2012/one-sentence-per-line/

Copy link
Contributor Author

@mellon-collie mellon-collie Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. This document states that any new sentence or a natural punctuation break triggers a new line.
Should I go ahead and make this change for the design doc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miquella Please opine on whether to convert to semantic linefeeds.

@mellon-collie You might also check with @pavannd1 or @viveksinghggits -- if they think it's a good idea for this doc (and generally for kanister docs?), then it would be reasonable to do so. I don't suggest doing it just on my mentioning it-- don't want to go back and forth without a general consensus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree with this. We recently started enforcing the 80-character limit on docs. It'll take a while to refactor older docs but we should definitely follow that moving forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks @pavannd1 This design doc has been changed to follow this.

Copy link
Contributor

@miquella miquella Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ewhamilton: I haven't heard of such guidance for this project before and have always found it difficult to follow "semantic linefeeds", but that may just be because I am accustomed to column hard-wrapped text 🤷

Comment on lines 80 to 88
BoundedLabel{
LabelName: "operation_type"
LabelValues: ["backup", "restore"]
}

BoundedLabel{
LabelName: "action_set_resolution"
LabelValues: ["success", "failure"]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the indentation here doesn't match the indentation above for the BoundedLabel struct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miquella This isn't intended to be go code. It's just an example of how an instance of the struct will look like. Do we still need to maintain the indentation? Because the markdown isn't giving us the intention automatically.
Please let me know if there are any better ways to represent this example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mellon-collie: This issue is more that the rendered version of this document is inconsistent. This makes it more difficult to read the document and for no particular reason.

Furthermore, it may cause the reader to wonder why things are different and whether those differences important? (obviously indentation isn't important in Go, but distracting the reader limits their ability to consume and understand the document)

design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
@mellon-collie mellon-collie force-pushed the kanister-prometheus-skeleton-design-doc branch 2 times, most recently from ed89d99 to 37680c4 Compare August 2, 2023 17:00
@mellon-collie mellon-collie force-pushed the kanister-prometheus-skeleton-design-doc branch from 37680c4 to 7933ceb Compare August 2, 2023 17:01
@mellon-collie mellon-collie force-pushed the kanister-prometheus-skeleton-design-doc branch from 7933ceb to 5066985 Compare August 2, 2023 17:40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File not updated to use newMetrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I've made the update!

Copy link
Contributor

@ewhamilton ewhamilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please do get 1 additional approval from one of your other reviewers on this as well.

Kanister automation moved this from In Progress to Reviewer approved Aug 7, 2023
Copy link
Contributor

@pavannd1 pavannd1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reformatted the text to 80 chars per line. No major changes to the content.

design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
design/kanister-prometheus-integration.md Outdated Show resolved Hide resolved
@mellon-collie mellon-collie force-pushed the kanister-prometheus-skeleton-design-doc branch from efb4ced to 93651c0 Compare August 7, 2023 21:11
mellon-collie and others added 11 commits August 7, 2023 17:12
Co-authored-by: Pavan Navarathna <6504783+pavannd1@users.noreply.github.com>
Co-authored-by: Pavan Navarathna <6504783+pavannd1@users.noreply.github.com>
Co-authored-by: Pavan Navarathna <6504783+pavannd1@users.noreply.github.com>
Co-authored-by: Pavan Navarathna <6504783+pavannd1@users.noreply.github.com>
Co-authored-by: Pavan Navarathna <6504783+pavannd1@users.noreply.github.com>
Co-authored-by: Pavan Navarathna <6504783+pavannd1@users.noreply.github.com>
Co-authored-by: Pavan Navarathna <6504783+pavannd1@users.noreply.github.com>
Co-authored-by: Pavan Navarathna <6504783+pavannd1@users.noreply.github.com>
Co-authored-by: Pavan Navarathna <6504783+pavannd1@users.noreply.github.com>
Co-authored-by: Pavan Navarathna <6504783+pavannd1@users.noreply.github.com>
@pavannd1 pavannd1 added the kueue label Aug 7, 2023
@mergify mergify bot merged commit 9c8e202 into master Aug 8, 2023
14 checks passed
Kanister automation moved this from Reviewer approved to Done Aug 8, 2023
@mergify mergify bot deleted the kanister-prometheus-skeleton-design-doc branch August 8, 2023 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants