Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1790: Manage enrichment via "k8s.v1.cni.cncf.io/network-status" #674

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jotak
Copy link
Member

@jotak jotak commented Jun 13, 2024

Description

Manage enrichment by extracting pod IPs from the annotation "k8s.v1.cni.cncf.io/network-status", which is used (at least) by multus

This allows to correlate Pods with their IPs on secondary interfaces

Breaking change

The API for add_kubernetes_infra transform stage has been modified to directly use enriched name & namespace to figure out the layer, instead of doing an informer lookup by IP. The reason is informers can now index by MAC.

To migrate, if add_kubernetes_infra rule was used such as:

    kubernetes_infra:
      inputs:
      - SrcAddr
      - DstAddr
      output: K8S_FlowLayer

it must be changed such as:

    kubernetes_infra:
      namespaceNameFields:
      - namespace: SrcK8S_Namespace
        name: SrcK8S_Name
      - namespace: DstK8S_Namespace
        name: DstK8S_Name
      output: K8S_FlowLayer

where SrcK8S_Namespace, SrcK8S_Name, DstK8S_Namespace and DstK8S_Name match the names of the corresponding enriched fields.

Dependencies

netobserv/network-observability-operator#732 for Mac enrichment

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jun 13, 2024
Copy link

openshift-ci bot commented Jun 13, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jotak. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

New image:
quay.io/netobserv/flowlogs-pipeline:dfe4fdf

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=dfe4fdf make set-flp-image

Copy link

codecov bot commented Jun 13, 2024

Codecov Report

Attention: Patch coverage is 65.43210% with 56 lines in your changes missing coverage. Please review.

Project coverage is 65.26%. Comparing base (f5d2cef) to head (abc8b6e).
Report is 1 commits behind head on main.

Files Patch % Lines
...peline/transform/kubernetes/informers/informers.go 32.07% 35 Missing and 1 partial ⚠️
pkg/pipeline/transform/kubernetes/cni/multus.go 63.63% 8 Missing ⚠️
...ipeline/transform/kubernetes/cni/ovn_kubernetes.go 28.57% 5 Missing ⚠️
pkg/pipeline/transform/transform_network.go 80.00% 3 Missing and 1 partial ⚠️
pkg/pipeline/transform/transform_filter.go 71.42% 1 Missing and 1 partial ⚠️
...e/transform/kubernetes/informers/informers-mock.go 96.29% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #674      +/-   ##
==========================================
- Coverage   65.36%   65.26%   -0.10%     
==========================================
  Files         107      108       +1     
  Lines        6874     6953      +79     
==========================================
+ Hits         4493     4538      +45     
- Misses       2071     2104      +33     
- Partials      310      311       +1     
Flag Coverage Δ
unittests 65.26% <65.43%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
pkg/api/transform_network.go 100.00% <ø> (ø)
pkg/config/generic_map.go 100.00% <100.00%> (ø)
pkg/pipeline/transform/kubernetes/enrich.go 79.56% <100.00%> (+2.39%) ⬆️
...e/transform/kubernetes/informers/informers-mock.go 97.05% <96.29%> (+0.54%) ⬆️
pkg/pipeline/transform/transform_filter.go 90.00% <71.42%> (+0.10%) ⬆️
pkg/pipeline/transform/transform_network.go 68.24% <80.00%> (+0.43%) ⬆️
...ipeline/transform/kubernetes/cni/ovn_kubernetes.go 64.86% <28.57%> (-3.71%) ⬇️
pkg/pipeline/transform/kubernetes/cni/multus.go 63.63% <63.63%> (ø)
...peline/transform/kubernetes/informers/informers.go 18.29% <32.07%> (+0.24%) ⬆️

... and 1 file with indirect coverage changes

Copy link
Collaborator

@jpinsonneau jpinsonneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jotak !

Comment on lines 42 to 50
err := json.Unmarshal([]byte(statusAnnotationJSON), &networks)
if err == nil {
for _, network := range networks {
allIPs = append(allIPs, network.IPs...)
}
return allIPs, nil
}

return nil, fmt.Errorf("cannot read annotation %s: %w", statusAnnotation, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this part of the code will be run at the pod informer initialisation, it can take a lot of CPU at FLP startup.

We should ensure this works fine on huge clusters.

@openshift-ci openshift-ci bot added the lgtm label Jul 19, 2024
@openshift-ci openshift-ci bot removed the lgtm label Aug 5, 2024
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 5, 2024
@jpinsonneau
Copy link
Collaborator

jpinsonneau commented Aug 5, 2024

f39388d
^ checking on mac addresses when possible

image

@jpinsonneau jpinsonneau changed the title Manage enrichment via "k8s.v1.cni.cncf.io/network-status" NETOBSERV-1790 Manage enrichment via "k8s.v1.cni.cncf.io/network-status" Aug 5, 2024
Comment on lines 26 to 27
// TODO: replace this by a new rule input
mac := fmt.Sprintf("%s", outputEntry[strings.Replace(rule.Input, "Addr", "Mac", -1)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need to be changed before merge
I'm still discussing about putting this behind a feature gate or not at this stage

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mac := fmt.Sprintf("%s", outputEntry[strings.Replace(rule.Input, "Addr", "Mac", -1)])
var mac string
if len(rule.MacInput) > 0 {
mac = fmt.Sprintf("%s", outputEntry[rule.MacInput])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For info, for performance we should avoid using fmt.Sprintf("%s", ...) whenever possible especially when it's part of the flows processing path - it's not only here but in a bunch of other places too. I've just created a task for that: https://issues.redhat.com/browse/NETOBSERV-1799

@jotak
Copy link
Member Author

jotak commented Aug 13, 2024

thanks @jpinsonneau !
/lgtm

Copy link

openshift-ci bot commented Aug 13, 2024

@jotak: you cannot LGTM your own PR.

In response to this:

thanks @jpinsonneau !
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jotak jotak changed the title NETOBSERV-1790 Manage enrichment via "k8s.v1.cni.cncf.io/network-status" NETOBSERV-1790: Manage enrichment via "k8s.v1.cni.cncf.io/network-status" Aug 13, 2024
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Aug 13, 2024

@jotak: This pull request references NETOBSERV-1790 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Description

Manage enrichment by extracting pod IPs from the annotation "k8s.v1.cni.cncf.io/network-status", which is used (at least) by multus

This allows to correlate Pods with their IPs on secondary interfaces

Dependencies

netobserv/network-observability-operator#732 for Mac enrichment

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the lgtm label Aug 13, 2024
jotak and others added 4 commits August 14, 2024 09:47
Manage enrichment by extracting pod IPs from the annotation
"k8s.v1.cni.cncf.io/network-status", which is used (at least) by multus

This allows to correlate Pods with their IPs on secondary interfaces
Copy link

openshift-ci bot commented Aug 14, 2024

New changes are detected. LGTM label has been removed.

@jotak jotak added the breaking-change This pull request has breaking changes. They should be described in PR description. label Aug 14, 2024
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Aug 14, 2024

@jotak: This pull request references NETOBSERV-1790 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Description

Manage enrichment by extracting pod IPs from the annotation "k8s.v1.cni.cncf.io/network-status", which is used (at least) by multus

This allows to correlate Pods with their IPs on secondary interfaces

Breaking change

The API for add_kubernetes_infra transform stage has been modified to directly use enriched name & namespace to figure out the layer, instead of doing an informer lookup by IP. The reason is informers can now index by MAC.

To migrate, if add_kubernetes_infra rule was used such as:

   kubernetes_infra:
     inputs:
     - SrcAddr
     - DstAddr
     output: K8S_FlowLayer

it must be changed such as:

   kubernetes_infra:
     namespaceNameFields:
     - namespace: SrcK8S_Namespace
       name: SrcK8S_Name
     - namespace: DstK8S_Namespace
       name: DstK8S_Name
     output: K8S_FlowLayer

where SrcK8S_Namespace, SrcK8S_Name, DstK8S_Namespace and DstK8S_Name match the names of the corresponding enriched fields.

Dependencies

netobserv/network-observability-operator#732 for Mac enrichment

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Do not use IP lookup, since now some pods are indexed via MAC and not
IPs
@msherif1234
Copy link
Contributor

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Aug 21, 2024
@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

1 similar comment
@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

New image:
quay.io/netobserv/flowlogs-pipeline:f457772

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=f457772 make set-flp-image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change This pull request has breaking changes. They should be described in PR description. jira/valid-reference needs-rebase ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants