Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-466: Add metric to count packet number in FLP #179

Merged
merged 9 commits into from
Oct 13, 2022

Conversation

OlivierCazade
Copy link
Contributor

@OlivierCazade OlivierCazade commented Oct 10, 2022

This add a metric to count packet number.

Story was to add only total packet count but I aligned this with the bytes count, and we can still sum different tag to get the total number of packets.

@memodi
Copy link
Contributor

memodi commented Oct 10, 2022

@OlivierCazade Also, any reason we're not adding metric for "FlowDirection: 1" ?

cc @jotak - I believe this metric would also scale similarly as sent/received_bytes_total

@jotak
Copy link
Member

jotak commented Oct 11, 2022

@memodi yes ... I'd suggest some improvements in our handling of these metrics. I will create a new JIRA for that.

Basically what I'd like to do is to have some default config to not expose all these metrics. The mechanism to ignore metrics already exist, it works with the tags defined here and the ignoreMetrics config there.

Basically I think we could only expose received_bytes_total by default (btw: I don't like its name ... "received" is confusing, maybe we should use the terms ingress/egress instead) ; other metrics would have to be enabled explicitely.
So we could have this set of tags to control what we want:

  • ingress
  • egress
  • bytes
  • packets
  • workloads
  • namespaces

Also I'd like to introduce new similar metrics that have only namespace labels, not workloads, also disabled by default. So that can be used as a mitigation solution if users see too high cardinality with metrics, putting too much pressure on prometheus & FLP, without entirely turning off telemetry: they could turn off per-workload metrics and turn on per-namespace metrics instead.

As said, I'll create a new JIRA

@jotak
Copy link
Member

jotak commented Oct 11, 2022

Also, any reason we're not adding metric for "FlowDirection: 1" ?

+1 I think we should provide that as well (it can catch some traffic not caught on ingress), however, as said above, we may want to disable it through config by default.

@OlivierCazade
Copy link
Contributor Author

Thanks for feedback!

Also, any reason we're not adding metric for "FlowDirection: 1" ?

The reason for not catching FLowDirection: 1 was to be consistent with the received_bytes_total metric

+1 I think we should provide that as well (it can catch some traffic not caught on ingress)

This is true, this mean this metrics are reporting wrong value, should we remove both until we find a way to clean duplicate? May be we have to wait for connection tracking

Would you be fine with splitting these metrics into four ?
Something like captured_ingress_bytes_total captured_egress_bytes_total captured_ingress_packets_total and captured_egress_packets_total (and may be enabling by default only (captured_ingress_bytes_total`).

@OlivierCazade
Copy link
Contributor Author

OlivierCazade commented Oct 11, 2022

I modified current metrics tag and default ignored metrics.

This PR also address part of this task :
https://issues.redhat.com/browse/NETOBSERV-647

The only remaining part is about having "light" metrics with less cardinality.

@memodi , the last part is renaiming the metrics. Is it fine or would it create a lot of work on your side?

@memodi
Copy link
Contributor

memodi commented Oct 11, 2022

@memodi , the last part is renaiming the metrics. Is it fine or would it create a lot of work on your side?

if it's absolutely desired, go ahead renaming them. thanks!

- egress
- bytes
- workloads
- namespaces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't set "namespaces" here, else if we later add namespace-only metrics later, we wouldn't be able to filter them out without impacting this one as well (or we'd need to find a trick to do it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, depends how we do the namespace-only metrics ... you plan to add label-removal feature like we discussed previously, right? If so, then it's ok for me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to do the label-removal, but I am not sure that this is the priority.

I removed the namespaces label and I will add it again once the label removal task is done.

- egress
- packets
- workloads
- namespaces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here and same for the other workload metrics

valuekey: Packets
filter:
key: FlowDirection
value: "0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be "1" for egress

@memodi
Copy link
Contributor

memodi commented Oct 13, 2022

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 13, 2022
@github-actions
Copy link

New image: ["quay.io/netobserv/network-observability-operator:f831013"]. It will expire after two weeks.

@OlivierCazade
Copy link
Contributor Author

/approve

@openshift-ci
Copy link

openshift-ci bot commented Oct 13, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: OlivierCazade

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit af38913 into netobserv:main Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants