Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-578: deduplicate flows at agent level #56

Merged
merged 2 commits into from
Oct 3, 2022
Merged

NETOBSERV-578: deduplicate flows at agent level #56

merged 2 commits into from
Oct 3, 2022

Conversation

mariomac
Copy link

Detects when the same flow is detected in different interfaces and then forwards the flow from the first interface it was received from.

In hosts with low traffic it decreased the number of flows by 20% and hosts with higher traffic, by 40%.

However, it seems that in OpenShift we still could have some duplicates as the same flow can be detected from two agents in different hosts. In that case, a second deduplication should be performed at flowlogs-pipeline level.

@mariomac mariomac added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 29, 2022
@github-actions
Copy link

New image: ["quay.io/netobserv/netobserv-ebpf-agent:87c9d1c"]. It will expire after two weeks.

@codecov-commenter
Copy link

Codecov Report

Merging #56 (8b7a451) into main (ec135a6) will increase coverage by 2.72%.
The diff coverage is 88.70%.

@@            Coverage Diff             @@
##             main      #56      +/-   ##
==========================================
+ Coverage   32.04%   34.76%   +2.72%     
==========================================
  Files          18       19       +1     
  Lines        1239     1300      +61     
==========================================
+ Hits          397      452      +55     
- Misses        823      829       +6     
  Partials       19       19              
Flag Coverage Δ
unittests 34.76% <88.70%> (+2.72%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/agent.go 14.91% <0.00%> (-0.52%) ⬇️
pkg/flow/deduper.go 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

OlivierCazade
OlivierCazade previously approved these changes Oct 3, 2022
Copy link
Contributor

@OlivierCazade OlivierCazade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few comments but they are not blocking comments feel free to ignore.

// value: listElement pointing to a struct entry
ifaces map[RecordKey]*list.Element
// element: entry structs of the ifaces map ordered by expiry time
ll *list.List
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking: may be ll is too generic, may be something like RecRefs?

for _, record := range records {
if !cache.isDupe(&record.RecordKey) {
fwd = append(fwd, record)
}
Copy link
Contributor

@OlivierCazade OlivierCazade Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible improvement: adding a metric to count the number of duplicate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a debug log below. If you mean prometheus metrics, maybe we can create a new task to integrate a metrics server here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was talking about a prometheus metric and I am fine with creating a different task about it.

// a flow hasn't been received for that expiry time, the deduplicator forgets it. That means
// that a flow from a connection that has been inactive during that period could be forwarded
// again from a different interface.
DeduperFCExpiry time.Duration `env:"DEDUPER_FC_EXPIRY" envDefault:"30s"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the 30s default a bit too high causing unnecessary memory consumption?

For a same flow being captured going through 2 different interface on the same node, I would have thought that this was below the second. So may be a default value like 5s would be enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm since I want to be consistent between flows' flush periods, this should be > 1 cache flush periods. So probably we can set it, by default, to "CacheActiveTimeout * 2" (10s)

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2022
@mariomac mariomac merged commit a3e5f9a into netobserv:main Oct 3, 2022
@mariomac mariomac deleted the dedupe branch October 3, 2022 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants