Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1275: Introduce new "INNER" direction for inner-node traffic #483

Merged
merged 3 commits into from
Sep 8, 2023

Conversation

jotak
Copy link
Member

@jotak jotak commented Sep 1, 2023

Related PR: netobserv/network-observability-console-plugin#378

The flows (and duplicates) generated for inner-node traffic differs compared to node-to-node traffic, and reinterpret direction isn't able to decide between ingress or egress. This is causing discrepancies with the dedup mechanism that filters out flows where Duplicate=true and also favors ingress over egress, since the Duplicate flag can be set randomly on ingress or on egress.

To fix that, the proposed solution is to create this new INNER direction specifically for this kind of traffic. Deduping this INNER traffic can then rely solely on the Duplicate flag, since that flag was set from a single Agent (single node) there will always be only one Duplicate=false.

@jotak
Copy link
Member Author

jotak commented Sep 1, 2023

TODO: doc update done

jotak added a commit to jotak/network-observability-console-plugin that referenced this pull request Sep 1, 2023
Related PR: netobserv/flowlogs-pipeline#483

The flows (and duplicates) generated for inner-node traffic differs compared to node-to-node traffic, and reinterpret direction isn't able to decide between ingress or egress. This is causing discrepancies with the dedup mechanism that filters out flows where Duplicate=true and also favors ingress over egress, since the Duplicate flag can be set randomly on ingress or on egress.

To fix that, the proposed solution is to create this new INNER direction specifically for this kind of traffic. Deduping this INNER traffic can then rely solely on the Duplicate flag, since that flag was set from a single Agent (single node) there will always be only one Duplicate=false.
@codecov
Copy link

codecov bot commented Sep 1, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (e958fe7) 66.03% compared to head (f8de62d) 66.03%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #483   +/-   ##
=======================================
  Coverage   66.03%   66.03%           
=======================================
  Files          94       94           
  Lines        6921     6921           
=======================================
  Hits         4570     4570           
  Misses       2092     2092           
  Partials      259      259           
Flag Coverage Δ
unittests 66.03% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
pkg/api/transform_network.go 83.33% <ø> (ø)
pkg/pipeline/transform/transform_network.go 72.04% <100.00%> (-0.52%) ⬇️
.../pipeline/transform/transform_network_direction.go 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.

📢 Have feedback on the report? Share it here.

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 1, 2023
@github-actions
Copy link

github-actions bot commented Sep 1, 2023

New image:
quay.io/netobserv/flowlogs-pipeline:deb2e32

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=deb2e32 make set-flp-image

@jotak jotak changed the title Introduce new "INNER" direction for inner-node traffic NETOBSERV-1275: Introduce new "INNER" direction for inner-node traffic Sep 1, 2023
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Sep 1, 2023

@jotak: This pull request references NETOBSERV-1275 which is a valid jira issue.

In response to this:

Related PR: netobserv/network-observability-console-plugin#378

The flows (and duplicates) generated for inner-node traffic differs compared to node-to-node traffic, and reinterpret direction isn't able to decide between ingress or egress. This is causing discrepancies with the dedup mechanism that filters out flows where Duplicate=true and also favors ingress over egress, since the Duplicate flag can be set randomly on ingress or on egress.

To fix that, the proposed solution is to create this new INNER direction specifically for this kind of traffic. Deduping this INNER traffic can then rely solely on the Duplicate flag, since that flag was set from a single Agent (single node) there will always be only one Duplicate=false.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 1, 2023
@jotak jotak marked this pull request as ready for review September 1, 2023 13:59
jpinsonneau
jpinsonneau previously approved these changes Sep 4, 2023
Copy link
Collaborator

@jpinsonneau jpinsonneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, let's just wait for plugin PR before merge.

Edit: what about managing NETOBSERV-1109 in this PR at the same time ?
We would just need to revert a05a86d

@jpinsonneau
Copy link
Collaborator

Another case where reinterpret doesn't work as expected is when service is involved.
In that situation the related node is not set and reinterpret is done incorrectly.

{
  "AgentIP": "10.0.1.79",
  "Bytes": 151,
  "DstAddr": "10.129.0.33",
  "DstK8S_HostIP": "10.0.1.79",
  "DstK8S_HostName": "ip-10-0-1-79.ec2.internal",
  "DstK8S_Name": "flowlogs-pipeline-tfbtz",
  "DstK8S_Namespace": "netobserv",
  "DstK8S_OwnerName": "flowlogs-pipeline",
  "DstK8S_OwnerType": "DaemonSet",
  "DstK8S_Type": "Pod",
  "DstMac": "0A:58:0A:81:00:21",
  "DstPort": 52682,
  "Duplicate": false,
  "Etype": 2048,
  "FlowDirection": "0",
  "IfDirection": 1,
  "Interface": "947d95f7ea5991c",
  "K8S_ClusterName": "8c12585d-2e05-47b7-b41d-a44f7e4b93f0",
  "Packets": 1,
  "Proto": 17,
  "SrcAddr": "172.30.0.10",
  "SrcK8S_Name": "dns-default",
  "SrcK8S_Namespace": "openshift-dns",
  "SrcK8S_OwnerName": "dns-default",
  "SrcK8S_OwnerType": "Service",
  "SrcK8S_Type": "Service",
  "SrcMac": "0A:58:0A:81:00:06",
  "SrcPort": 53,
  "TimeFlowEndMs": 1693833862664,
  "TimeFlowStartMs": 1693833862664,
  "TimeReceived": 1693833866,
  "_HashId": "b3de5075fb27159f",
  "_RecordType": "flowLog",
  "app": "netobserv-flowcollector"
}

@jotak
Copy link
Member Author

jotak commented Sep 5, 2023

Another case where reinterpret doesn't work as expected is when service is involved. In that situation the related node is not set and reinterpret is done incorrectly.

In this example, AgentIP == DstK8S_HostIP so reinterpret says it's an Ingress (seems like this is a DNS response observed from the sender). Why is it incorrect? It seems an ingress indeed, despite not knowing the other end's host IP ?

@jotak
Copy link
Member Author

jotak commented Sep 5, 2023

@jpinsonneau for conversation I can remove that if, you think it's good enough? (no change to do on query side?)

@openshift-ci
Copy link

openshift-ci bot commented Sep 5, 2023

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci bot removed the lgtm label Sep 5, 2023
@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 5, 2023
@github-actions
Copy link

github-actions bot commented Sep 5, 2023

New image:
quay.io/netobserv/flowlogs-pipeline:6048977

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=6048977 make set-flp-image

@jpinsonneau
Copy link
Collaborator

In this example, AgentIP == DstK8S_HostIP so reinterpret says it's an Ingress (seems like this is a DNS response observed from the sender). Why is it incorrect? It seems an ingress indeed, despite not knowing the other end's host IP ?

DNS requests and responses are using the same flow, one egress and one ingress.
https://github.com/netobserv/netobserv-ebpf-agent/blob/main/bpf/dns_tracker.h#L25-L28
https://issues.redhat.com/browse/NETOBSERV-1240

Would you expect Src and Dst to be flipped for the response ?

@jpinsonneau for conversation I can remove that if, you think it's good enough? (no change to do on query side?)

Yes please 👍 No changes on query side as far as I seen.

jotak added a commit to jotak/network-observability-console-plugin that referenced this pull request Sep 5, 2023
Related PR: netobserv/flowlogs-pipeline#483

The flows (and duplicates) generated for inner-node traffic differs compared to node-to-node traffic, and reinterpret direction isn't able to decide between ingress or egress. This is causing discrepancies with the dedup mechanism that filters out flows where Duplicate=true and also favors ingress over egress, since the Duplicate flag can be set randomly on ingress or on egress.

To fix that, the proposed solution is to create this new INNER direction specifically for this kind of traffic. Deduping this INNER traffic can then rely solely on the Duplicate flag, since that flag was set from a single Agent (single node) there will always be only one Duplicate=false.
@jotak
Copy link
Member Author

jotak commented Sep 5, 2023

Would you expect Src and Dst to be flipped for the response ?

Well this is how it has always worked with netflows, so yes, for the response, Src must be the DNS server, and Dst must be the querier

Edit: but it seems to be already the case, no?

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 5, 2023
@jpinsonneau
Copy link
Collaborator

Would you expect Src and Dst to be flipped for the response ?

Well this is how it has always worked with netflows, so yes, for the response, Src must be the DNS server, and Dst must be the querier

Edit: but it seems to be already the case, no?

Yes my bad, I retested and see 1 egress (Pod -> Service) and 2 ingress (Service -> Pod) with one duplicate.
But why don't we consider these as Inner since services are running on every nodes ?

@jotak
Copy link
Member Author

jotak commented Sep 5, 2023

Services aren't really running on nodes, I'm not sure we could consider them as "Inner traffic" ...
I think the root cause here for DNS is different, it's that the DNS tracker sets its data on a flow that potentially has been flagged as duplicate. Ideally, the DNS tracker should not do that ; it should pick the non-duplicate flow to enrich it , but I'm not sure we can do that in the bpf program .. cc @msherif1234 do you know?

The flows (and duplicates) generated for inner-node traffic differs
compared to node-to-node traffic, and reinterpret direction isn't able
to decide between ingress or egress. This is causing discrepancies with
the dedup mechanism that filters out flows where Duplicate=true and also
favors ingress over egress.

To fix that, the proposed solution is to create this new INNER direction
specifically for this kind of traffic. Deduping this INNER traffic can
then rely solely on the Duplicate flag, since that flag was set from a
single Agent (single node) there will always be only one
Duplicate=false.
@jotak
Copy link
Member Author

jotak commented Sep 8, 2023

Regarding my last comment: DNS duplicates are currently being considered via this ticket and dealt with from the Agent

@jotak
Copy link
Member Author

jotak commented Sep 8, 2023

/approve
will be tested by QE post-merge

@openshift-ci
Copy link

openshift-ci bot commented Sep 8, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Sep 8, 2023
@openshift-merge-robot openshift-merge-robot merged commit 058bc4e into netobserv:main Sep 8, 2023
8 checks passed
openshift-merge-robot pushed a commit to netobserv/network-observability-console-plugin that referenced this pull request Sep 8, 2023
#378)

* Introduce new "INNER" direction for inner-node traffic

Related PR: netobserv/flowlogs-pipeline#483

The flows (and duplicates) generated for inner-node traffic differs compared to node-to-node traffic, and reinterpret direction isn't able to decide between ingress or egress. This is causing discrepancies with the dedup mechanism that filters out flows where Duplicate=true and also favors ingress over egress, since the Duplicate flag can be set randomly on ingress or on egress.

To fix that, the proposed solution is to create this new INNER direction specifically for this kind of traffic. Deduping this INNER traffic can then rely solely on the Duplicate flag, since that flag was set from a single Agent (single node) there will always be only one Duplicate=false.

* fix fmt

* Fix test

* Update texts & API for doc

* fix linter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants