Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection tracking - hash mechanism #201

Merged
merged 18 commits into from
May 24, 2022
Merged

Conversation

ronensc
Copy link
Collaborator

@ronensc ronensc commented May 11, 2022

This is the first PR of the new Connection tracking module. Only the hash mechanism is implemented here. many more PRs are to come...

@ronensc ronensc requested review from eranra and KalmanMeth May 11, 2022 11:36
@codecov-commenter
Copy link

codecov-commenter commented May 11, 2022

Codecov Report

Merging #201 (862a303) into main (fdd8ca2) will increase coverage by 0.12%.
The diff coverage is 81.25%.

@@            Coverage Diff             @@
##             main     #201      +/-   ##
==========================================
+ Coverage   58.37%   58.49%   +0.12%     
==========================================
  Files          58       59       +1     
  Lines        3351     3409      +58     
==========================================
+ Hits         1956     1994      +38     
- Misses       1267     1282      +15     
- Partials      128      133       +5     
Flag Coverage Δ
unittests 58.49% <81.25%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ansform/connection_tracking/connection_tracking.go 97.50% <ø> (ø)
pkg/pipeline/conntrack/hash.go 81.25% <81.25%> (ø)
pkg/pipeline/encode/encode_prom.go 81.77% <0.00%> (-1.78%) ⬇️
pkg/pipeline/transform/transform_network.go 60.00% <0.00%> (-0.84%) ⬇️
pkg/pipeline/transform/kubernetes/kubernetes.go 15.42% <0.00%> (+1.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdd8ca2...862a303. Read the comment docs.


// Compute the total hash
hash := fnv.New32a()
for _, fgName := range keyFields.Hash.FieldGroups {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain somewhere why we need different FieldGroups in KeyFields and in Hash? What is the purpose of these additional FieldGroups?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One is of type []FieldGroup and one is of type []string. Using the same name for these 2 different fields is a bit confusing. I assume that the []string is derived from Name field of the []FieldGroup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct.
KeyFields.FieldGroups is where you define the field groups
and
KeyFields.Hash.FieldGroups is where you reference the defined field groups (by name). I'll rename it and add some comments to make it a bit clearer.

Please take a look at these configuration examples:
https://gist.github.com/ronensc/0136cd1c3a761e2d6067679362fa72c6
https://gist.github.com/ronensc/c6d0a07b0503bbd7a36bc4ef0b6b5b29

type ConnTrackHash struct {
FieldGroups []string `yaml:"fieldGroups" doc:"list of field groups"`
FieldGroupA string `yaml:"fieldGroupA" doc:"field group A"`
FieldGroupB string `yaml:"fieldGroupB" doc:"field group B"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a description somewhere of what is meant by A and B?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment to describe their meaning

},
Hash: api.ConnTrackHash{
FieldGroups: []string{"protocol"},
FieldGroupA: "src",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of FieldGroupA field (non-blank) should be explained somewhere.

// ComputeHash computes the hash of a flow log according to keyFields.
// Two flow logs will have the same hash if they belong to the same connection.
func ComputeHash(flowLog config.GenericMap, keyFields api.KeyFields) ([]byte, error) {
type hashType []byte
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronensc If you externalize the hashType type you will be able to also return hashType and not []byte

pkg/api/conn_track.go Outdated Show resolved Hide resolved
func computeHashFields(flowLog config.GenericMap, fieldNames []string) ([]byte, error) {
h := fnv.New32a()
for _, fn := range fieldNames {
// TODO: How should we handle a missing fieldName?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Info log and skip ( for now) --- this will cause in different Hash anyway ... maybe later we will make that a parameter

}

// Compute the total hash
hash := fnv.New32a()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we "trick" somehow (maybe interface?) so that it will be a parameter to the functions what length of Hash to use >???

@mariomac maybe you have an idea ??? ^^^

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, actually the fnv.New32a and rest of hashers implement the io.Writer interface. You can change the function signature to something like:

func computeHashFields(flowLog config.GenericMap, fieldNames []string, hasher io.Writer) ([]byte, error) {

and invoke it like:

h, err := computeHashFields(flowLog, fg.Fields, fnv.New32a())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, if you want to be even more restrictive in what you can pass as argument, you can use the hash.Hash interface instead of io.Writer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

@eranra eranra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me ... more than good actually :-) I added some comments but nothing really prevents merging.,

f := flowLog[fn]
f, ok := flowLog[fn]
if !ok {
log.Warningf("Missing field %v", fn)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be risky. It could end up flooding the log file. Is there a way to avoid repeated logs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can replace it with a prometheus metric. WDYT?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this needs to be tracked as a metric.  It doesn't seem like it's likely to occur but can if someone wants to be malicious.  For now, maybe let it go, but we should have a general solution to avoid spamming the log file.

Name: "dst",
Fields: []string{
"DstAddr",
"DstPort",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you include the ephemeral port in the hash, it will count a lot of connections.  We need to get an agreement on what a "connection" is.  Example: Let's say you access a web page.  Is that one connection?  With this implementation, it could be anywhere from 1 to 6 connections.

A typical web page will refer to JavaScript files, CSS files, images, etc.  The browser will need to fetch these files.  Pretty much all modern browsers today will allow up to 6 simultaneous connections to the same domain, and will reuse these connections to fetch all the files.  Each of these connections will have a different ephemeral port.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I tried to make the decision of what is a connection configurable. But it's still not flexible enough to support this use-case.

This specific unit test configures the classic 5-tuple to distinguish between connections. The other unit test uses a slightly different configuration. It uses the same 5-tuple but includes both flow directions in the same connection.

@ronensc ronensc merged commit 9ea2a31 into netobserv:main May 24, 2022
@ronensc ronensc deleted the conntrack-hash branch May 24, 2022 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants