-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NETOBSERV-739: Add prometheus #613
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@jotak: This pull request references NETOBSERV-739 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@jotak: This pull request references NETOBSERV-739 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:dd87eac make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-dd87eac Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-dd87eac
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
// `prometheusClient` defines Prometheus client settings, used to fetch metrics from the Console plugin. | ||
PrometheusClient FlowCollectorPrometheusClient `json:"prometheusClient,omitempty"` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// `prometheusClient` defines Prometheus client settings, used to fetch metrics from the Console plugin. | |
PrometheusClient FlowCollectorPrometheusClient `json:"prometheusClient,omitempty"` | |
// `prometheus` defines Prometheus client settings, used to fetch metrics from the Console plugin. | |
Prometheus FlowCollectorPrometheus `json:"prometheus,omitempty"` |
For consistency, I feel it would be better to simply have prometheus
as same as loki
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that could contains FLP prometheus settings in future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk, I added "client" precisely to avoid confusion with the metrics producing side.
Like, if we use prometheus
, then what would the user expect the setting prometheus.enable
to be? Likely that it's going to turn off metrics generation, which is not the case.
Or should we come up with something like:
prometheus:
querier:
<insert all settings>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense indeed. And we should also end up allowing users to disable prometheus export from FLP too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current way to do this would be to set an empty includeList
of metrics
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:a7cd60f make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-a7cd60f Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-a7cd60f
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #613 +/- ##
==========================================
- Coverage 67.00% 66.53% -0.47%
==========================================
Files 68 70 +2
Lines 7804 8095 +291
==========================================
+ Hits 5229 5386 +157
- Misses 2197 2315 +118
- Partials 378 394 +16
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@jotak: This pull request references NETOBSERV-739 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@jotak: This pull request references NETOBSERV-739 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:3e9c83b make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-3e9c83b Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-3e9c83b
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
@memodi FYI: last build adds fix for NETOBSERV-1654 and also a small perf improvement, using exact match with the default quick filters instead of partial matching |
default: true | ||
- name: Services network | ||
filter: | ||
dst_kind: 'Service' | ||
dst_kind: '"Service"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is bit ugly and not user friendly why did u change the APIs
https://github.com/netobserv/network-observability-operator/pull/613/files#diff-1aad53ab28e869c35c03e94f19d4e37402b869c34eabfd55317931ab5e2b1436R863 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc'd @memodi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a perf improvement for queries, have quotes means doing an exact match instead of a partial match, which is more performant
another than the API question above |
this would need update to our tests that verifies "filters" |
If you feel this shouldn't be in this PR I can remove it, but it seems to me it's a quick & easy way to improve query perf |
no worries, it's okay to have it in this PR, I just wanted to make a note of it. |
@jotak the other thing I am noticing is below when flowcollector is created newly, plugin goes from Running --> Terminating --> Running state and I see below events for plugin before it stabilizes, not sure why it's scale down and back up:
confirming this doesn't happen on a latest bundle, but only when Operator bundle is |
@memodi I don't reproduce this either :-( |
@jotak here's the must-gather tar file: https://drive.google.com/file/d/1O6eduOKzr8SqoxZyylJ1HKxsG8u9eZm9/view?usp=drive_link |
- FlowCollector CRD: new config for prometheus client - Allow disabling Loki and still use the console plugin (with prometheus) - Add some labels in metrics to maximize coverage of plugin queries to prom: K8S_FlowLayer, Src|DstK8S_Type on workload metrics Fix configuring metrics with openshift Use explicit metrics config, use enable bool for prom fix tests Use nested `prometheus.querier` in API
New changes are detected. LGTM label has been removed. |
Ok @memodi I think I've fixed it, that's a weird thing I never noticed before, but it seems like when you apply a custom resource without setting fields explicitly, relying on their default value, there's a small time where the CR is actually stored without the fields set, and quickly after it's automatically updated with the default fields. Here, I found that your custom resource did not have any config for prom, so it was first seeing So my fix just considers empty mode is equivalent to "Auto" mode |
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:1c6ec02 make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-1c6ec02 Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-1c6ec02
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
thanks @jotak , yes, we usually work with defaults in our tests as much as possible and turn on/off only if we need to for our testing. Last commit fixed the bug. /label qe-approved |
@jotak: This pull request references NETOBSERV-739 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
unit tests are failing weirdly on CI, they work fine locally - I'll see on Monday if that's all clean before merging |
Description
Adds a new config section in CRD to allow reading metrics via prometheus ( / thanos) :
This is enabled by default.
Dependencies
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.