Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EnableLogging and LogLabel supports for Node NetworkPolicy #6626

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Aug 23, 2024

For #6525

This commit introduces limited support for traffic logging in Node
NetworkPolicy. The limitations are:

  • Traffic logs are written only to the system log (not managed by Antrea).
    Users can filter logs using syslog filters.
  • The LogLabel for Node NetworkPolicy is restricted to a maximum of
    12 characters.

Node NetworkPolicy's data path is implemented via iptables. An iptables
"non-terminating target" LOG is added before the final matching rule to
log packets to the system kernel log. The logs provide packet match details,
such as:

Sep  2 10:31:07 k8s-node-control-plane kernel: [6657320.789675] Antrea:I:Allow:allow-http:IN=ens224 OUT= MAC=00:50:56:a7:fb:18:00:50:56:a7:23:47:08:00 SRC=10.10.0.10 DST=192.168.240.200 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52813 DF PROTO=TCP SPT=57658 DPT=80 WINDOW=64240 RES=0x00 SYN URGP=0
Sep  2 10:31:11 k8s-node-control-plane kernel: [6657324.899219] Antrea:I:Drop:default-drop:IN=ens224 OUT= MAC=00:50:56:a7:fb:18:00:50:56:a7:23:47:08:00 SRC=192.168.240.201 DST=192.168.240.200 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27486 DF PROTO=TCP SPT=33152 DPT=80 WINDOW=64240 RES=0x00 SYN URGP=0

The log prefix (e.g., Antrea:I:Allow:allow-http:) is up to 29 characters long
and includes a user-provided log label (up to 12 characters). The log prefix format:

|---1--| |2| |---3--| |----------4--------|
|Antrea|:|I|:|Reject|:|user-provided label|:|
|6     |1|1|1|4-6   |1|1-12               |1|
  • Part 1: Fixed, "Antrea"
  • Part 2: Direction, "I" (In) or "O" (Out)
  • Part 3: Action, "Allow", "Drop", or "Reject"
  • Part 4: User-provided log label, up to 12 characters

@hongliangl hongliangl added the action/release-note Indicates a PR that should be included in release notes. label Aug 23, 2024
@hongliangl hongliangl added this to the Antrea v2.2 release milestone Aug 23, 2024
@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 2 times, most recently from 82bc18c to 1589c2a Compare September 2, 2024 11:23
@hongliangl hongliangl marked this pull request as ready for review September 2, 2024 11:23
@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 2 times, most recently from a829204 to 43efbe4 Compare September 3, 2024 03:02
pkg/controller/networkpolicy/validate.go Outdated Show resolved Hide resolved
docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
// used as iptables log prefix. According to https://ipset.netfilter.org/iptables-extensions.man.html, the prefix is
// up to 29 letters long.
if len(logLabel) > 28 {
klog.InfoS("The log label exceeds 28 characters and will be truncated", "logLabel", logLabel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log this in the validation webhook (instead or in addition)? cc @tnqn for his opinion

Copy link
Contributor Author

@hongliangl hongliangl Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with validating with a webhook.

One more thing, we need to add some basic information about the rule to the logLabel which will be used as iptables log prefix. The worst case is like the following:


| Antrea |:|Out|:|Reject|:| [user-provided label] |:|
| 6      |1|3  |1|6     |1| n                     |1|

As a result, the user-provided log label (n) is up to 10 characters long. Will that be sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having a fixed-limit, regardless of the rule direction or the rule action, may be the right approach.
Maybe we could replace "Out" with "O" and "In" with "I". I would give us 2 extra characters (making the limit 12 for the user-provided label), without really compromising readability too much. I was thinking of doing the same for the action, but I think it would hurt readability too much?

12 characters is not much, but we are heavily constrained by the implementation.
I was initially thinking of failing in the validation webhook if the limit is exceeded, but I now feel like it is better to just log a message warning the user. I'd log it once in the webhook (controller) and once in the agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd log it once in the webhook (controller) and once in the agent.

Do you mean that something should be shown when a Node NetworkPolicy with a log label over 12 characters?

I made some changes to the validator. If I understood correctly, what I expected is that: when I use kubectl to apply a Node NetworkPolicy with a log label over 12 characters, a warning message will be shown in the terminal. It seems that there is something wrong, after applying such Node NetworkPolicy, nothing is shown in the terminal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will need to use the Warnings field of the AdmissionResponse: https://pkg.go.dev/k8s.io/api/admission/v1#AdmissionResponse

I am not sure we have done it in the past. Is this what you tried?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That's what I need!

// used as iptables log prefix. According to https://ipset.netfilter.org/iptables-extensions.man.html, the prefix is
// up to 29 letters long.
if len(logLabel) > 28 {
klog.InfoS("The log label exceeds 28 characters and will be truncated", "logLabel", logLabel)
Copy link
Contributor

@luolanzone luolanzone Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log is a little misleading, probably better to include the logLabel with prefix and the original one in Node NP.

Suggested change
klog.InfoS("The log label exceeds 28 characters and will be truncated", "logLabel", logLabel)
klog.InfoS("The log label exceeds 28 characters and will be truncated", "logLabel", logLabel, "rule.LogLabel", rule.LogLabel)

I am wondering will this kind of truncation has any side-affect? this label seems likely to be duplicated in different Node NPs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this label seems likely to be duplicated in different Node NPs.

Do you mean that different Node NPs that have that same logLabels?

@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 3 times, most recently from 3dfb4ee to 5df6efe Compare September 10, 2024 03:32
@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 3 times, most recently from 503637b to fa40171 Compare September 10, 2024 07:34
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good on my side, only minor comments

docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
Comment on lines 89 to 90
EnableLogging: true,
LogLabel: "ingress1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to make sure that we still have a policy rule for which EnableLogging is false, for the sake of testing?

Copy link
Contributor Author

@hongliangl hongliangl Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just reminded me. I left one policy with EnableLogging set to false and another policy with EnableLogging set to true but with LogLabel left empty, for the sake of testing.

pkg/controller/networkpolicy/validate.go Outdated Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 2 times, most recently from 446c1da to 4fef712 Compare September 11, 2024 06:26
antoninbas
antoninbas previously approved these changes Sep 11, 2024
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on my side
@luolanzone do you want to do another review?

@luolanzone
Copy link
Contributor

Looks good on my side @luolanzone do you want to do another review?

Probably no time this week, I will check with hongliang offline for the question I posted. @antoninbas you can go ahead. Thanks.

@antoninbas
Copy link
Contributor

/test-all

@antoninbas
Copy link
Contributor

@hongliangl I just realized, shouldn't we have an e2e test for this? Without it we only have unit tests and no guarantee that it actually works

@hongliangl
Copy link
Contributor Author

@hongliangl I just realized, shouldn't we have an e2e test for this? Without it we only have unit tests and no guarantee that it actually works

Make sense, I'll add.

docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
docs/antrea-node-network-policy.md Outdated Show resolved Hide resolved
pkg/controller/networkpolicy/validate.go Outdated Show resolved Hide resolved
pkg/controller/networkpolicy/validate.go Outdated Show resolved Hide resolved
pkg/controller/networkpolicy/validate.go Outdated Show resolved Hide resolved
test/e2e/nodenetworkpolicy_test.go Outdated Show resolved Hide resolved
assert.EventuallyWithT(t, func(tc *assert.CollectT) {
// After generating some traffic, there should be corresponding kernel logs on hostNetwork Pod x/a, and the kernel
// logs should contain the generated random logLabel and not contain the part that should be truncated.
stdout, stderr, err = data.RunCommandFromPod(antreaNamespace, antreaPodName, agentContainerName, []string{"dmesg"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to limit the output of dmesg, the default output might be large. Maybe dmesg | tail -n 200?
otherwise, the following codes will continuously checking the logs and fail at the last.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense

@hongliangl hongliangl force-pushed the 20240822-nnp-enable-logging branch 2 times, most recently from 9a163b2 to dc77645 Compare September 28, 2024 06:06
test/e2e/nodenetworkpolicy_test.go Outdated Show resolved Hide resolved
test/e2e/nodenetworkpolicy_test.go Outdated Show resolved Hide resolved
re4 := regexp.MustCompile(fmt.Sprintf(`(?s)%s.*SRC=%s DST=%s`, expectedLogPrefix, srcIP4, dstIP4))
re6 := regexp.MustCompile(fmt.Sprintf(`(?s)%s.*SRC=%s DST=%s`, expectedLogPrefix, srcIP6, dstIP6))

assert.EventuallyWithT(t, func(tc *assert.CollectT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we usuall just name the *assert.CollectT variable t

// logs should contain the generated random logLabel and not contain the part that should be truncated.
stdout, stderr, err = data.RunCommandFromPod(antreaNamespace, antreaPodName, agentContainerName, []string{"sh", "-c", "dmesg | tail -n 200"})
if err != nil || stderr != "" {
t.Fatalf("Got error: %v, %v", err, stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not technically correct. You cannot call Fatal / FailNow from here, as it does not run in the main goroutine. This is why for such cases we tend to stick to wait.Poll, until the fix to stretchr/testify#1396 is part of a testify release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have reminded me. I have changed to use wait.Poll and left a TODO to replace wait.Poll with assert.Eventually after stretchr/testify#1396 is merged.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, you just need to address the linter error

test/e2e/nodenetworkpolicy_test.go Outdated Show resolved Hide resolved
luolanzone
luolanzone previously approved these changes Oct 8, 2024
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one comment.

re4 := regexp.MustCompile(fmt.Sprintf(`(?s)%s.*SRC=%s DST=%s`, expectedLogPrefix, srcIP4, dstIP4))
re6 := regexp.MustCompile(fmt.Sprintf(`(?s)%s.*SRC=%s DST=%s`, expectedLogPrefix, srcIP6, dstIP6))

// TODO: Replace wait.PollUntilContextTimeout with assert.EventuallyWithT after https://github.com/stretchr/testify/issues/1396 is resolved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the issue is already resolved, please check if you want to update in this PR or with a new PR to resolve this TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the issue is already resolved

Exactly, but the latest release of github.com/stretchr/testify is v1.9.0, which doesn't include the fix for the issue.

@antoninbas
Copy link
Contributor

/test-kind-all

antoninbas
antoninbas previously approved these changes Oct 8, 2024
This commit introduces limited support for traffic logging in Node
NetworkPolicy. The limitations are:

- Traffic logs are written only to the system log (not managed by Antrea).
  Users can filter logs using syslog filters.
- The `LogLabel` for Node NetworkPolicy is restricted to a maximum of
  12 characters.

Node NetworkPolicy's data path is implemented via iptables. An iptables
"non-terminating target" `LOG` is added before the final matching rule to
log packets to the system kernel log. The logs provide packet match details,
such as:

```
Sep  2 10:31:07 k8s-node-control-plane kernel: [6657320.789675] Antrea:I:Allow:allow-http:IN=ens224 OUT= MAC=00:50:56:a7:fb:18:00:50:56:a7:23:47:08:00 SRC=10.10.0.10 DST=192.168.240.200 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52813 DF PROTO=TCP SPT=57658 DPT=80 WINDOW=64240 RES=0x00 SYN URGP=0
Sep  2 10:31:11 k8s-node-control-plane kernel: [6657324.899219] Antrea:I:Drop:default-drop:IN=ens224 OUT= MAC=00:50:56:a7:fb:18:00:50:56:a7:23:47:08:00 SRC=192.168.240.201 DST=192.168.240.200 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27486 DF PROTO=TCP SPT=33152 DPT=80 WINDOW=64240 RES=0x00 SYN URGP=0
```

The log prefix (e.g., `Antrea:I:Allow:allow-http:`) is up to 29 characters long
and includes a user-provided log label (up to 12 characters). The log prefix format:

```
|---1--| |2| |---3--| |----------4--------|
|Antrea|:|I|:|Reject|:|user-provided label|:|
|6     |1|1|1|4-6   |1|1-12               |1|
```

- Part 1: Fixed, "Antrea"
- Part 2: Direction, "I" (In) or "O" (Out)
- Part 3: Action, "Allow", "Drop", or "Reject"
- Part 4: User-provided log label, up to 12 characters

Signed-off-by: Hongliang Liu <hongliang.liu@broadcom.com>
@antoninbas
Copy link
Contributor

@hongliangl was the last push just a rebase?

@antoninbas
Copy link
Contributor

/test-kind-all

@hongliangl
Copy link
Contributor Author

hongliangl commented Oct 10, 2024

@hongliangl was the last push just a rebase?

A rebase to resolve conflicts

@antoninbas
Copy link
Contributor

/test-kind-all

@antoninbas
Copy link
Contributor

/test-all

@antoninbas antoninbas merged commit a419c8c into antrea-io:main Oct 10, 2024
56 of 63 checks passed
@hongliangl hongliangl deleted the 20240822-nnp-enable-logging branch October 11, 2024 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants