Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix that AntreaProxy could unintentionally delete conntrack entries in zone 0 #6193

Merged

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Apr 7, 2024

This is a subsequent PR for #5112. As mentioned in #5112:

Due to the restriction of the go library 'netlink', there is no API to specify a
target zone. As a result, when deleting a stale conntrack entry with a
destination port (such as NodePort), not only will the conntrack entry whose
destination port is the port added by AntreaProxy be deleted, but also the
conntrack entry that is not added by AntreaProxy will be deleted. This behavior
is unexpected, as only the conntrack entries added by AntreaProxy should be
deleted.

This PR resolves the issue by integrating a CT zone filter, now available in
the latest Go library netlink. Leveraging this feature, AntreaProxy can
accurately delete stale UDP conntrack entries.

@hongliangl hongliangl marked this pull request as draft April 7, 2024 11:22
@hongliangl hongliangl marked this pull request as ready for review April 11, 2024 01:20
@hongliangl hongliangl force-pushed the 20240407-stale-udp-connection-delete branch from 4dbda3d to 6fd4aec Compare April 11, 2024 01:28
@hongliangl hongliangl added the area/proxy Issues or PRs related to proxy functions in Antrea label Apr 11, 2024
@hongliangl hongliangl requested a review from tnqn April 11, 2024 01:29
@tnqn tnqn added the action/release-note Indicates a PR that should be included in release notes. label Apr 11, 2024
tnqn
tnqn previously approved these changes Apr 11, 2024
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Apr 11, 2024

Some tests failed, can you check?

@hongliangl hongliangl changed the title Fix that AntreaProxy could unintentionally delete conntrack entries in zone 0 Promote feature gate CleanupStaleUDPSvcConntrack to Beta Apr 11, 2024
@hongliangl hongliangl force-pushed the 20240407-stale-udp-connection-delete branch from 6fd4aec to bb6ba90 Compare April 11, 2024 03:23
@hongliangl
Copy link
Contributor Author

Some tests failed, can you check?

Sure. Besides, I was wondering if we could promote the feature gate CleanupStaleUDPSvcConntrack to Beta stage?

@hongliangl hongliangl force-pushed the 20240407-stale-udp-connection-delete branch 2 times, most recently from 09305f1 to fc64dcc Compare April 11, 2024 03:29
@tnqn
Copy link
Member

tnqn commented Apr 11, 2024

Some tests failed, can you check?

Sure. Besides, I was wondering if we could promote the feature gate CleanupStaleUDPSvcConntrack to Beta stage?

How about running it for a while in CI first? I want to ensure this doesn't cause unexpected behaviors, though it doesn't seem to. Promoting a feature shouldn't be in the same PR as a bugfix, the former needs to be more careful and the latter can potentially be backported (though not this case) while the former will never.

@hongliangl hongliangl force-pushed the 20240407-stale-udp-connection-delete branch from fc64dcc to 233faac Compare April 11, 2024 05:30
@hongliangl hongliangl changed the title Promote feature gate CleanupStaleUDPSvcConntrack to Beta Fix that AntreaProxy could unintentionally delete conntrack entries in zone 0 Apr 11, 2024
@hongliangl hongliangl added the action/backport Indicates a PR that requires backports. label Apr 11, 2024
@hongliangl hongliangl force-pushed the 20240407-stale-udp-connection-delete branch 2 times, most recently from 17bb6f3 to 68e1082 Compare April 11, 2024 08:09
@hongliangl
Copy link
Contributor Author

hongliangl commented Apr 11, 2024

Some tests failed, can you check?

Sure. Besides, I was wondering if we could promote the feature gate CleanupStaleUDPSvcConntrack to Beta stage?

How about running it for a while in CI first? I want to ensure this doesn't cause unexpected behaviors, though it doesn't seem to. Promoting a feature shouldn't be in the same PR as a bugfix, the former needs to be more careful and the latter can potentially be backported (though not this case) while the former will never.

Got that. I have resolved the test failure of unit tests and integration tests, and I found that e2e test E2e tests on a Kind cluster on Linux got a failure. I'm not sure if it is caused by the bump of netlink. I try to run this test several times to see that if the failed tests can be reproduced every time.

@hongliangl
Copy link
Contributor Author

@tnqn I found that failure of test E2e tests on a Kind cluster on Linux was caused by the test TestAntreaPolicy/TestGroupValidationWebhook/Case=CreateInvalidTier. The following is the output. I'm not sure if this is caused by the bump of netlink. Have we had such test failures before?

=== RUN   TestAntreaPolicy/TestGroupValidationWebhook/Case=CreateInvalidTier
I0411 08:28:10.128451   23360 k8s_util.go:758] Creating tier tier-prio-20
I0411 08:28:10.138881   23360 k8s_util.go:758] Creating tier another-tier-prio-20
E0411 08:28:10.146187   23360 antreapolicy_test.go:82] Tiers created with overlapping priorities
I0411 08:28:10.153528   23360 k8s_util.go:1175] Deleting test Namespace {x-essto3kz map[]}
I0411 08:28:16.164888   23360 framework.go:793] Deleting Namespace x-essto3kz took 6.011230475s
I0411 08:28:16.164939   23360 k8s_util.go:1175] Deleting test Namespace {y-essto3kz map[]}
I0411 08:28:22.173253   23360 framework.go:793] Deleting Namespace y-essto3kz took 6.008293191s
I0411 08:28:22.174063   23360 k8s_util.go:1175] Deleting test Namespace {z-essto3kz map[]}
I0411 08:28:28.184114   23360 framework.go:793] Deleting Namespace z-essto3kz took 6.009502881s
    antreapolicy_test.go:84: test failed: Tiers created with overlapping priorities

See L506-L516 in https://github.com/antrea-io/antrea/actions/runs/8643534991/job/23697191286?pr=6193 for more information.

Got the same failure at L505-515 in attempt 2 https://github.com/antrea-io/antrea/actions/runs/8643534991/job/23700838340?pr=6193

Got the same failure at L564-574 in attempt 4 https://github.com/antrea-io/antrea/actions/runs/8643534991/job/23715419360?pr=6193

@hongliangl
Copy link
Contributor Author

/test-e2e

1 similar comment
@hongliangl
Copy link
Contributor Author

/test-e2e

@hongliangl hongliangl force-pushed the 20240407-stale-udp-connection-delete branch from 68e1082 to 3f5be8c Compare April 24, 2024 07:40
@@ -33,6 +33,7 @@ import (
utilnet "k8s.io/utils/net"

"antrea.io/antrea/pkg/agent/config"
"antrea.io/antrea/pkg/agent/openflow"
"antrea.io/antrea/pkg/agent/servicecidr"
"antrea.io/antrea/pkg/agent/types"
"antrea.io/antrea/pkg/agent/util/ipset"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to fix if r.Dst == nil in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@hongliangl hongliangl force-pushed the 20240407-stale-udp-connection-delete branch 2 times, most recently from bd253ea to d4928cd Compare May 23, 2024 09:38
tnqn
tnqn previously approved these changes May 23, 2024
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hongliangl
Copy link
Contributor Author

/test-latest-conformance

1 similar comment
@hongliangl
Copy link
Contributor Author

/test-latest-conformance

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented May 28, 2024

/skip-all

@tnqn tnqn merged commit df1655c into antrea-io:main May 28, 2024
55 of 58 checks passed
hongliangl added a commit to hongliangl/antrea that referenced this pull request May 28, 2024
In antrea-io#5112, due to the limitations of the Go netlink library, AntreaProxy
would unconditionally delete conntrack entries added by kube-proxy in
conntrack zone 0. AntreaProxy was supposed to only delete its own entries
in conntrack zones 65520 or 65521. To address this, a feature was added
to isolate the relevant code.

After the merge of antrea-io#6193, the netlink library was updated, allowing
AntreaProxy to precisely delete conntrack entries in zones 65520 or 65521.
It is now safe to enable the corresponding code by default.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
@hongliangl hongliangl deleted the 20240407-stale-udp-connection-delete branch May 28, 2024 16:02
hongliangl added a commit to hongliangl/antrea that referenced this pull request May 29, 2024
In antrea-io#5112, due to the limitations of the Go netlink library, AntreaProxy
would unconditionally delete conntrack entries added by kube-proxy in
conntrack zone 0. AntreaProxy was supposed to only delete its own entries
in conntrack zones 65520 or 65521. To address this, a feature was added
to isolate the relevant code.

After the merge of antrea-io#6193, the netlink library was updated, allowing
AntreaProxy to precisely delete conntrack entries in zones 65520 or 65521.
It is now safe to enable the corresponding code by default.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this pull request May 29, 2024
In antrea-io#5112, due to the limitations of the Go netlink library, AntreaProxy
would unconditionally delete conntrack entries added by kube-proxy in
conntrack zone 0. AntreaProxy was supposed to only delete its own entries
in conntrack zones 65520 or 65521. To address this, a feature was added
to isolate the relevant code.

After the merge of antrea-io#6193, the netlink library was updated, allowing
AntreaProxy to precisely delete conntrack entries in zones 65520 or 65521.
It is now safe to enable the corresponding code by default.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this pull request May 30, 2024
In antrea-io#5112, due to the limitations of the Go netlink library, AntreaProxy
would unconditionally delete conntrack entries added by kube-proxy in
conntrack zone 0. AntreaProxy was supposed to only delete its own entries
in conntrack zones 65520 or 65521. To address this, a feature was added
to isolate the relevant code.

After the merge of antrea-io#6193, the netlink library was updated, allowing
AntreaProxy to precisely delete conntrack entries in zones 65520 or 65521.
It is now safe to enable the corresponding code by default.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
@gran-vmv
Copy link
Contributor

gran-vmv commented Jun 5, 2024

@hongliangl Do we need to backport this PR? My PR #6321 rely on this and it cannot be backported without this PR.

@hongliangl
Copy link
Contributor Author

You could back them together.

@hongliangl
Copy link
Contributor Author

I meant multiple PRs could be backported together in a PR.

hongliangl added a commit to hongliangl/antrea that referenced this pull request Jun 5, 2024
…n zone 0 (antrea-io#6193)

This is a subsequent PR for antrea-io#5112. As mentioned in antrea-io#5112:

> Due to the restriction of the go library 'netlink', there is no API to specify a
  target zone. As a result, when deleting a stale conntrack entry with a
  destination port (such as NodePort), not only will the conntrack entry whose
  destination port is the port added by AntreaProxy be deleted, but also the
  conntrack entry that is not added by AntreaProxy will be deleted. This behavior
  is unexpected, as only the conntrack entries added by AntreaProxy should be
  deleted.

This PR resolves the issue by integrating a CT zone filter, now available in
the latest Go library `netlink`. Leveraging this feature, AntreaProxy can
accurately delete stale UDP conntrack entries.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this pull request Jun 5, 2024
…n zone 0 (antrea-io#6193)

This is a subsequent PR for antrea-io#5112. As mentioned in antrea-io#5112:

> Due to the restriction of the go library 'netlink', there is no API to specify a
  target zone. As a result, when deleting a stale conntrack entry with a
  destination port (such as NodePort), not only will the conntrack entry whose
  destination port is the port added by AntreaProxy be deleted, but also the
  conntrack entry that is not added by AntreaProxy will be deleted. This behavior
  is unexpected, as only the conntrack entries added by AntreaProxy should be
  deleted.

This PR resolves the issue by integrating a CT zone filter, now available in
the latest Go library `netlink`. Leveraging this feature, AntreaProxy can
accurately delete stale UDP conntrack entries.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this pull request Jun 5, 2024
…n zone 0 (antrea-io#6193)

This is a subsequent PR for antrea-io#5112. As mentioned in antrea-io#5112:

> Due to the restriction of the go library 'netlink', there is no API to specify a
  target zone. As a result, when deleting a stale conntrack entry with a
  destination port (such as NodePort), not only will the conntrack entry whose
  destination port is the port added by AntreaProxy be deleted, but also the
  conntrack entry that is not added by AntreaProxy will be deleted. This behavior
  is unexpected, as only the conntrack entries added by AntreaProxy should be
  deleted.

This PR resolves the issue by integrating a CT zone filter, now available in
the latest Go library `netlink`. Leveraging this feature, AntreaProxy can
accurately delete stale UDP conntrack entries.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this pull request Jun 5, 2024
…n zone 0 (antrea-io#6193)

This is a subsequent PR for antrea-io#5112. As mentioned in antrea-io#5112:

> Due to the restriction of the go library 'netlink', there is no API to specify a
  target zone. As a result, when deleting a stale conntrack entry with a
  destination port (such as NodePort), not only will the conntrack entry whose
  destination port is the port added by AntreaProxy be deleted, but also the
  conntrack entry that is not added by AntreaProxy will be deleted. This behavior
  is unexpected, as only the conntrack entries added by AntreaProxy should be
  deleted.

This PR resolves the issue by integrating a CT zone filter, now available in
the latest Go library `netlink`. Leveraging this feature, AntreaProxy can
accurately delete stale UDP conntrack entries.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this pull request Jun 5, 2024
…n zone 0 (antrea-io#6193)

This is a subsequent PR for antrea-io#5112. As mentioned in antrea-io#5112:

> Due to the restriction of the go library 'netlink', there is no API to specify a
  target zone. As a result, when deleting a stale conntrack entry with a
  destination port (such as NodePort), not only will the conntrack entry whose
  destination port is the port added by AntreaProxy be deleted, but also the
  conntrack entry that is not added by AntreaProxy will be deleted. This behavior
  is unexpected, as only the conntrack entries added by AntreaProxy should be
  deleted.

This PR resolves the issue by integrating a CT zone filter, now available in
the latest Go library `netlink`. Leveraging this feature, AntreaProxy can
accurately delete stale UDP conntrack entries.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
tnqn pushed a commit that referenced this pull request Jun 6, 2024
…n zone 0 (#6193) (#6406)

This is a subsequent PR for #5112. As mentioned in #5112:

> Due to the restriction of the go library 'netlink', there is no API to specify a
  target zone. As a result, when deleting a stale conntrack entry with a
  destination port (such as NodePort), not only will the conntrack entry whose
  destination port is the port added by AntreaProxy be deleted, but also the
  conntrack entry that is not added by AntreaProxy will be deleted. This behavior
  is unexpected, as only the conntrack entries added by AntreaProxy should be
  deleted.

This PR resolves the issue by integrating a CT zone filter, now available in
the latest Go library `netlink`. Leveraging this feature, AntreaProxy can
accurately delete stale UDP conntrack entries.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
tnqn pushed a commit that referenced this pull request Jun 6, 2024
…n zone 0 (#6193) (#6404)

This is a subsequent PR for #5112. As mentioned in #5112:

> Due to the restriction of the go library 'netlink', there is no API to specify a
  target zone. As a result, when deleting a stale conntrack entry with a
  destination port (such as NodePort), not only will the conntrack entry whose
  destination port is the port added by AntreaProxy be deleted, but also the
  conntrack entry that is not added by AntreaProxy will be deleted. This behavior
  is unexpected, as only the conntrack entries added by AntreaProxy should be
  deleted.

This PR resolves the issue by integrating a CT zone filter, now available in
the latest Go library `netlink`. Leveraging this feature, AntreaProxy can
accurately delete stale UDP conntrack entries.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
tnqn pushed a commit that referenced this pull request Jun 6, 2024
…n zone 0 (#6193) (#6405)

This is a subsequent PR for #5112. As mentioned in #5112:

> Due to the restriction of the go library 'netlink', there is no API to specify a
  target zone. As a result, when deleting a stale conntrack entry with a
  destination port (such as NodePort), not only will the conntrack entry whose
  destination port is the port added by AntreaProxy be deleted, but also the
  conntrack entry that is not added by AntreaProxy will be deleted. This behavior
  is unexpected, as only the conntrack entries added by AntreaProxy should be
  deleted.

This PR resolves the issue by integrating a CT zone filter, now available in
the latest Go library `netlink`. Leveraging this feature, AntreaProxy can
accurately delete stale UDP conntrack entries.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this pull request Jun 6, 2024
In antrea-io#5112, due to the limitations of the Go netlink library, AntreaProxy
would unconditionally delete conntrack entries added by kube-proxy in
conntrack zone 0. AntreaProxy was supposed to only delete its own entries
in conntrack zones 65520 or 65521. To address this, a feature was added
to isolate the relevant code.

After the merge of antrea-io#6193, the netlink library was updated, allowing
AntreaProxy to precisely delete conntrack entries in zones 65520 or 65521.
It is now safe to enable the corresponding code by default.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this pull request Jun 7, 2024
In antrea-io#5112, due to the limitations of the Go netlink library, AntreaProxy
would unconditionally delete conntrack entries added by kube-proxy in
conntrack zone 0. AntreaProxy was supposed to only delete its own entries
in conntrack zones 65520 or 65521. To address this, a feature was added
to isolate the relevant code.

After the merge of antrea-io#6193, the netlink library was updated, allowing
AntreaProxy to precisely delete conntrack entries in zones 65520 or 65521.
It is now safe to enable the corresponding code by default.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this pull request Jun 12, 2024
In antrea-io#5112, due to the limitations of the Go netlink library, AntreaProxy
would unconditionally delete conntrack entries added by kube-proxy in
conntrack zone 0. AntreaProxy was supposed to only delete its own entries
in conntrack zones 65520 or 65521. To address this, a feature was added
to isolate the relevant code.

After the merge of antrea-io#6193, the netlink library was updated, allowing
AntreaProxy to precisely delete conntrack entries in zones 65520 or 65521.
It is now safe to enable the corresponding code by default.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
tnqn pushed a commit that referenced this pull request Jun 13, 2024
In #5112, due to the limitations of the Go netlink library, AntreaProxy
would unconditionally delete conntrack entries added by kube-proxy in
conntrack zone 0. AntreaProxy was supposed to only delete its own entries
in conntrack zones 65520 or 65521. To address this, a feature was added
to isolate the relevant code.

After the merge of #6193, the netlink library was updated, allowing
AntreaProxy to precisely delete conntrack entries in zones 65520 or 65521.
It is now safe to enable the corresponding code by default.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes. area/proxy Issues or PRs related to proxy functions in Antrea
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants